talrid23 @talrid23 X Profile

talrid23

@talrid23

Followers

379

Following

1K

Media

69

Statuses

496

Deep learning researcher, @QodoAI

Israel

Joined April 2020

Don't wanna be here? Send us removal request.

talrid23

@talrid23

10 months

💡 So how is the new Claude 3.5 Sonnet model on real-world code tasks?. We conducted extensive performance testing of Claude 3.5 Sonnet on a dataset of pull request code. 🔬 Our Test Setup:. - We aggregated 200 pull requests spanning multiple repositories and programming

5

1

12

talrid23

@talrid23

2 days

RT @foojayio: Check it out, @QodoAI and @MongoDB articles hot off the press, both on #Java and AI, "How to Make a RAG Application With @lan….

foojay.io

Get hands-on experience with the exact code examined in this article, along with exercises, debugging techniques, and best practices for production deployment.

0

2

0

talrid23

@talrid23

18 days

RT @QodoAI: How to get the most out of Qodo Merge for PR reviews 🧵. Here’s a breakdown of the most powerful workflows in Qodo Merge 👇.

0

7

0

talrid23

@talrid23

28 days

Read more about Qodo Merge PR Benchmark here:.

qodo-merge-docs.qodo.ai

None

0

1

talrid23

@talrid23

28 days

So, how is the new Grok-4 model on real-world coding tasks ?. The verdict? It's solid, but not a revolution. While Grok-4 shows impressive gains on academic benchmarks, established 'thinking' models like o3 and Gemini-Pro still outperform it on more practical coding tasks like

1

3

talrid23

@talrid23

1 month

Chat vs Agent Mode: Are Most AI Tools Getting It Wrong? 🤔. I'm finding myself preferring 'chat' for most of my daily AI interactions. It just feels more natural and efficient for the majority of coding tasks. But is "chat" mode the opposite of "agentic" mode? What are the real.

0

2

talrid23

@talrid23

2 months

Note that this benchmark not only provides a final score, but also analyzes the strengths and weaknesses of each model. This surfaced, for example, a specific problem of the new codex-mini model in following instructions and generating structured output reliably:.

0

talrid23

@talrid23

2 months

Qodo Merge PR Benchmark:.

qodo-merge-docs.qodo.ai

None

1

0

talrid23

@talrid23

2 months

🤔 OpenAI's Code Model Confusion: Four Models, No Clear Winner. After releasing codex-mini last week, OpenAI now has four (!) code models, with significantly different speed-quality tradeoffs. Instead of one solid solution, we get a fragmented landscape:. ⚡ GPT-4.1 - Their

1

0

1

talrid23

@talrid23

2 months

RT @itamar_mar: #OpenSource is still very awesome. Even in the age of LLMs, which can generate entire projects from a single prompt. Just….

0

2

0

talrid23

@talrid23

2 months

With 1024 thinking tokens, Gemini Pro is still expected to beat Claude 4 sonnet (which has no reasoning).

0

1

talrid23

@talrid23

2 months

Qodo Merge PR Benchmark.

qodo-merge-docs.qodo.ai

None

0

talrid23

@talrid23

2 months

The new Gemini-pro-06-05 model includes an option to control the thinking token budget, making it viable for latency-constrained environments. But how does this new model apply to real-life problems? Well, it depends dramatically on how much it's allowed to think. 🤔. Using the

3

2

8

talrid23

@talrid23

2 months

RT @QodoAI: Pro tip: turn review feedback into ready-to-commit code changes instantly with Qodo Merge's /implement command.

0

296

0

talrid23

@talrid23

3 months

RT @QodoAI: Qodo Gen now supports Claude Sonnet 4 & Opus 4!. This latest model takes its coding capabilities to the next level — exceling a….

0

5

0

talrid23

@talrid23

3 months

And here is an example of a more detailed analysis of Gemini 2.5 Pro Vs Claude Opus 4.0:

0

5

talrid23

@talrid23

3 months

And Claude top model, Opus, is no match for Gemini-2.5-Pro.

0

1

5

talrid23

@talrid23

3 months

While Sonnet-4.0 is indeed better than Sonnet-3.7, it is still inferior to GPT-4.1.

0

6

talrid23

@talrid23

3 months

So, how are the new Claude models on real-world coding tasks ?. We benchmarked them on the 'Qodo Merge Pull Request Benchmark,' a dataset designed to evaluate a model's ability to provide meaningful code suggestions for various PRs. The results are .

5

2

30

talrid23

@talrid23

3 months

Install PR-Agent v0.29 easily here: . The full release notes:.

github.com

codiumai/pr-agent:0.29 codiumai/pr-agent:0.29-github_app codiumai/pr-agent:0.29-bitbucket-app codiumai/pr-agent:0.29-gitlab_webhook codiumai/pr-agent:0.29-github_action codiumai/pr-agent:0.29-azure...

0

talrid23

@talrid23

3 months

🚀 Excited to announce PR-Agent v0.29 🚀. - Enhanced Access: We've upgraded PR-Agent to support all the newest top-tier AI models and frameworks, ensuring you have access to the most advanced tools. - Broader Platform Support: Enjoy improved compatibility across multiple git.

1

0

1