andre15silva_ Profile Banner
André Silva Profile
André Silva

@andre15silva_

Followers
142
Following
7K
Media
17
Statuses
1K

PhD at KTH 🧑‍🍳 ML on Code

Stockholm, Sweden
Joined November 2015
Don't wanna be here? Send us removal request.
@andre15silva_
André Silva
23 days
It should be possible to edit model responses in chat interfaces, e.g. for editing the plan in Cursor and Claude Code.
0
0
3
@andre15silva_
André Silva
1 month
RT @bjarnihaukur11: The funniest (unintentional) reward hack I saw while training my coding agent: it "rm -rf"'d the repo it was working on….
0
1
0
@grok
Grok
5 days
What do you want to know?.
424
262
2K
@andre15silva_
André Silva
3 months
Read more about GBPR, our experiments, challenges and future directions at
0
0
2
@andre15silva_
André Silva
3 months
Our experiments on 1466 buggy programs show that GBPR can repair a large number of buggy programs to near-perfect accuracy. As the picture shows, gradient descent navigates the "correctness landscape", iteratively moving from buggy (high loss) to repaired (low loss) behavior.
Tweet media one
1
0
1
@andre15silva_
André Silva
3 months
How GBPR works:.1️⃣ Compile the symbolic program into a differentiable numerical form (e.g. a neural network). 2️⃣ Define a "correctness loss" based on the desired behavior. 3️⃣ Use gradient descent to adjust the numerical program's parameters, minimizing the loss. 📉➡️✅.
1
0
0
@andre15silva_
André Silva
3 months
Our core idea with GBPR: reframe program repair as continuous optimization in a continuous program space. Instead of discrete token edits, we "steer" programs towards correctness by optimizing their numerical representations based on a correctness loss.
1
0
0
@andre15silva_
André Silva
3 months
Program repair involves searching discrete symbolic spaces, lacking a direct way to optimize for program behavior. What if we could change that?. 📄 Introducing our new paper "Gradient-Based Program Repair: Fixing Bugs in Continuous Program Spaces"
Tweet media one
1
1
1
@andre15silva_
André Silva
3 months
RT @menhguin: I picked a great time to switch to Cursor for LaTex editing.
0
1
0
@andre15silva_
André Silva
5 months
3️⃣Llama 4's underwhelming performance. Meta's release of Llama 4 was highly anticipated. However, its performance on RepairBench indicates that it does not outperform gpt-4o models and marginally improves over llama-3.1-405b, contradicting expectations based on LMArena scores.
1
0
0
@andre15silva_
André Silva
5 months
2️⃣Gemini 2.5 Pro shows good progress from Google. Google's Gemini 2.5 Pro has demonstrated improvements over its predecessors, with a Plausible@1 score of 38.3% vs. 33.2% of the previous generation. Despite this, it still falls short of the Claude and DeepSeek models.
1
0
0
@andre15silva_
André Silva
5 months
1️⃣Quasar-Alpha is a strong contender. Quasar-alpha, a stealth model available on OpenRouter, has made the news. With a Plausible@1 score of 40.5%, quasar-alpha is approaching the performance of leading models like Claude-3.5. The big question is: who is behind quasar-alpha?.
1
0
0
@andre15silva_
André Silva
5 months
🔧 This week on RepairBench:. - Quasar-Alpha is a strong contender.- Gemini 2.5 Pro shows good progress from Google.- Llama 4's underwhelming performance
Tweet media one
1
0
3
@andre15silva_
André Silva
5 months
0
0
0
@andre15silva_
André Silva
6 months
Just added claude 3.7 sonnet (non-thinking mode) to repairbench!. Strong improvement and almost on par with o3-mini and deepseek-r1. Wonder how far the thinking mode will land from the current top.
Tweet media one
1
0
3
@andre15silva_
André Silva
6 months
(probably good evidence that this new generation of models can handle more autonomy than the previous one).
0
0
0
@andre15silva_
André Silva
6 months
claude 3.7 sonnet working so hard that even copilot thinks it's stuck
Tweet media one
1
0
1
@andre15silva_
André Silva
6 months
RT @mokita_j: 🔥sb-heists is now featured in this week’s @blockthreat newsletter!. 🔍91 reproducible exploits for 9 blockchain vulnerabilitie….
0
1
0
@andre15silva_
André Silva
7 months
RT @martinmonperrus: OpenAI strikes back and reclaims first place 🥇 on the RepairBench leaderboard for automated bug fixing. https://t.co/….
0
1
0