talrid23 Profile
talrid23

@talrid23

Followers
379
Following
1K
Media
69
Statuses
496

Deep learning researcher, @QodoAI

Israel
Joined April 2020
Don't wanna be here? Send us removal request.
@talrid23
talrid23
10 months
💡 So how is the new Claude 3.5 Sonnet model on real-world code tasks?. We conducted extensive performance testing of Claude 3.5 Sonnet on a dataset of pull request code. 🔬 Our Test Setup:. - We aggregated 200 pull requests spanning multiple repositories and programming
Tweet media one
Tweet media two
5
1
12
@talrid23
talrid23
18 days
RT @QodoAI: How to get the most out of Qodo Merge for PR reviews 🧵. Here’s a breakdown of the most powerful workflows in Qodo Merge 👇.
0
7
0
@talrid23
talrid23
28 days
Read more about Qodo Merge PR Benchmark here:.
qodo-merge-docs.qodo.ai
None
0
0
1
@talrid23
talrid23
28 days
So, how is the new Grok-4 model on real-world coding tasks ?. The verdict? It's solid, but not a revolution. While Grok-4 shows impressive gains on academic benchmarks, established 'thinking' models like o3 and Gemini-Pro still outperform it on more practical coding tasks like
Tweet media one
1
1
3
@talrid23
talrid23
1 month
Chat vs Agent Mode: Are Most AI Tools Getting It Wrong? 🤔. I'm finding myself preferring 'chat' for most of my daily AI interactions. It just feels more natural and efficient for the majority of coding tasks. But is "chat" mode the opposite of "agentic" mode? What are the real.
0
0
2
@talrid23
talrid23
2 months
Note that this benchmark not only provides a final score, but also analyzes the strengths and weaknesses of each model. This surfaced, for example, a specific problem of the new codex-mini model in following instructions and generating structured output reliably:.
0
0
0
@talrid23
talrid23
2 months
Qodo Merge PR Benchmark:.
qodo-merge-docs.qodo.ai
None
1
0
0
@talrid23
talrid23
2 months
🤔 OpenAI's Code Model Confusion: Four Models, No Clear Winner. After releasing codex-mini last week, OpenAI now has four (!) code models, with significantly different speed-quality tradeoffs. Instead of one solid solution, we get a fragmented landscape:. ⚡ GPT-4.1 - Their
Tweet media one
1
0
1
@talrid23
talrid23
2 months
RT @itamar_mar: #OpenSource is still very awesome. Even in the age of LLMs, which can generate entire projects from a single prompt. Just….
0
2
0
@talrid23
talrid23
2 months
With 1024 thinking tokens, Gemini Pro is still expected to beat Claude 4 sonnet (which has no reasoning).
0
0
1
@talrid23
talrid23
2 months
Qodo Merge PR Benchmark.
qodo-merge-docs.qodo.ai
None
0
0
0
@talrid23
talrid23
2 months
The new Gemini-pro-06-05 model includes an option to control the thinking token budget, making it viable for latency-constrained environments. But how does this new model apply to real-life problems? Well, it depends dramatically on how much it's allowed to think. 🤔. Using the
Tweet media one
Tweet media two
3
2
8
@talrid23
talrid23
2 months
RT @QodoAI: Pro tip: turn review feedback into ready-to-commit code changes instantly with Qodo Merge's /implement command.
0
296
0
@talrid23
talrid23
3 months
RT @QodoAI: Qodo Gen now supports Claude Sonnet 4 & Opus 4!. This latest model takes its coding capabilities to the next level — exceling a….
0
5
0
@talrid23
talrid23
3 months
And here is an example of a more detailed analysis of Gemini 2.5 Pro Vs Claude Opus 4.0:
Tweet media one
0
0
5
@talrid23
talrid23
3 months
And Claude top model, Opus, is no match for Gemini-2.5-Pro.
0
1
5
@talrid23
talrid23
3 months
While Sonnet-4.0 is indeed better than Sonnet-3.7, it is still inferior to GPT-4.1.
0
0
6
@talrid23
talrid23
3 months
So, how are the new Claude models on real-world coding tasks ?. We benchmarked them on the 'Qodo Merge Pull Request Benchmark,' a dataset designed to evaluate a model's ability to provide meaningful code suggestions for various PRs. The results are .
Tweet media one
5
2
30
@talrid23
talrid23
3 months
🚀 Excited to announce PR-Agent v0.29 🚀. - Enhanced Access: We've upgraded PR-Agent to support all the newest top-tier AI models and frameworks, ensuring you have access to the most advanced tools. - Broader Platform Support: Enjoy improved compatibility across multiple git.
1
0
1