Explore tweets tagged as #miniCodeProps
Can LLMs prove that code is correct? New paper: "miniCodeProps: a Minimal Benchmark for Proving Code Properties" https://t.co/emuMuSaGLZ miniCodeProps tests LLMs' ability to prove properties of simple Lean programs. Despite its simplicity, it's challenging! Led by Evan Lohn
5
35
145
@wellecks The authors evaluated two baseline approaches for automatically proving the properties in "miniCodeProps": a next-step tactic generation method using the LLMStep framework, and a full proof generation approach using GPT-4. They found that these baselines were able to prove only
0
1
1
CMU Researchers Propose miniCodeProps: A Minimal AI Benchmark for Proving Code Properties https://t.co/6EiFpMCwOW
#AIinCodeVerification #miniCodeProps #TheoremProving #AutomationInTech #FutureOfProgramming #ai #news #llm #ml #research #ainews #innovation #artificialintelligen…
0
1
0
miniCodeProps: A minimal benchmark for proving code properties. ~ Evan Lohn, Sean Welleck. https://t.co/TGSBqw4rN5
#LLMs #ITP #Lean4 #FunctionalProgramming #Haskell
0
3
7
miniCodeProps: a Minimal Benchmark for Proving Code Properties.
0
0
0
CMU Researchers Propose miniCodeProps: A Minimal AI Benchmark for Proving Code Properties:
0
0
0
miniCodeProps: A minimal benchmark for proving properties of code. ~ Evan Lohn & Sean Welleck. https://t.co/4ak8WEYF2z
#ITP #LeanProver #Lean4 #LLMs
0
0
0
"Despite its simplicity, miniCodeProps is challenging for current LLM-based provers, which succeed in proving about 25 percent of the specifications."
Can LLMs prove that code is correct? New paper: "miniCodeProps: a Minimal Benchmark for Proving Code Properties" https://t.co/emuMuSaGLZ miniCodeProps tests LLMs' ability to prove properties of simple Lean programs. Despite its simplicity, it's challenging! Led by Evan Lohn
0
0
2
5. miniCodeProps: a Minimal Benchmark for Proving Code Properties https://t.co/emuMuSbeBx At Safe Generative AI, Sunday, Exhibition Hall A https://t.co/IdyxVKi5kt
Can LLMs prove that code is correct? New paper: "miniCodeProps: a Minimal Benchmark for Proving Code Properties" https://t.co/emuMuSaGLZ miniCodeProps tests LLMs' ability to prove properties of simple Lean programs. Despite its simplicity, it's challenging! Led by Evan Lohn
0
0
2