André Silva @andre15silva_ X Profile

André Silva

@andre15silva_

Followers

142

Following

7K

Media

17

Statuses

1K

PhD at KTH 🧑‍🍳 ML on Code

Stockholm, Sweden

Joined November 2015

Don't wanna be here? Send us removal request.

André Silva

@andre15silva_

23 days

It should be possible to edit model responses in chat interfaces, e.g. for editing the plan in Cursor and Claude Code.

0

3

André Silva

@andre15silva_

1 month

RT @bjarnihaukur11: The funniest (unintentional) reward hack I saw while training my coding agent: it "rm -rf"'d the repo it was working on….

0

1

0

Grok

@grok

5 days

What do you want to know?.

424

262

2K

André Silva

@andre15silva_

3 months

Read more about GBPR, our experiments, challenges and future directions at

0

2

André Silva

@andre15silva_

3 months

Our experiments on 1466 buggy programs show that GBPR can repair a large number of buggy programs to near-perfect accuracy. As the picture shows, gradient descent navigates the "correctness landscape", iteratively moving from buggy (high loss) to repaired (low loss) behavior.

1

0

1

André Silva

@andre15silva_

3 months

How GBPR works:.1️⃣ Compile the symbolic program into a differentiable numerical form (e.g. a neural network). 2️⃣ Define a "correctness loss" based on the desired behavior. 3️⃣ Use gradient descent to adjust the numerical program's parameters, minimizing the loss. 📉➡️✅.

1

0

André Silva

@andre15silva_

3 months

Our core idea with GBPR: reframe program repair as continuous optimization in a continuous program space. Instead of discrete token edits, we "steer" programs towards correctness by optimizing their numerical representations based on a correctness loss.

1

0

André Silva

@andre15silva_

3 months

Program repair involves searching discrete symbolic spaces, lacking a direct way to optimize for program behavior. What if we could change that?. 📄 Introducing our new paper "Gradient-Based Program Repair: Fixing Bugs in Continuous Program Spaces"

1

André Silva

@andre15silva_

3 months

RT @menhguin: I picked a great time to switch to Cursor for LaTex editing.

0

1

0

André Silva

@andre15silva_

5 months

These and other updates available on

repairbench.github.io

Explore RepairBench, the leaderboard of frontier models for program repair.

0

André Silva

@andre15silva_

5 months

3️⃣Llama 4's underwhelming performance. Meta's release of Llama 4 was highly anticipated. However, its performance on RepairBench indicates that it does not outperform gpt-4o models and marginally improves over llama-3.1-405b, contradicting expectations based on LMArena scores.

1

0

André Silva

@andre15silva_

5 months

2️⃣Gemini 2.5 Pro shows good progress from Google. Google's Gemini 2.5 Pro has demonstrated improvements over its predecessors, with a Plausible@1 score of 38.3% vs. 33.2% of the previous generation. Despite this, it still falls short of the Claude and DeepSeek models.

1

0

André Silva

@andre15silva_

5 months

1️⃣Quasar-Alpha is a strong contender. Quasar-alpha, a stealth model available on OpenRouter, has made the news. With a Plausible@1 score of 40.5%, quasar-alpha is approaching the performance of leading models like Claude-3.5. The big question is: who is behind quasar-alpha?.

1

0

André Silva

@andre15silva_

5 months

🔧 This week on RepairBench:. - Quasar-Alpha is a strong contender.- Gemini 2.5 Pro shows good progress from Google.- Llama 4's underwhelming performance

1

0

3

André Silva

@andre15silva_

5 months

0

André Silva

@andre15silva_

6 months

More details:

repairbench.github.io

Explore RepairBench, the leaderboard of frontier models for program repair.

0

1

André Silva

@andre15silva_

6 months

Just added claude 3.7 sonnet (non-thinking mode) to repairbench!. Strong improvement and almost on par with o3-mini and deepseek-r1. Wonder how far the thinking mode will land from the current top.

1

0

3

André Silva

@andre15silva_

6 months

(probably good evidence that this new generation of models can handle more autonomy than the previous one).

0

André Silva

@andre15silva_

6 months

claude 3.7 sonnet working so hard that even copilot thinks it's stuck

1

0

1

André Silva

@andre15silva_

6 months

RT @mokita_j: 🔥sb-heists is now featured in this week’s @blockthreat newsletter!. 🔍91 reproducible exploits for 9 blockchain vulnerabilitie….

0

1

0

André Silva

@andre15silva_

7 months

RT @martinmonperrus: OpenAI strikes back and reclaims first place 🥇 on the RepairBench leaderboard for automated bug fixing. https://t.co/….

0

1

0