@CosmicAI_Inst
CosmicAI
22 days
Exciting news! Introducing AstroVisBench: A Code Benchmark for Scientific Computing and Visualization in Astronomy! A new benchmark developed by researchers CosmicAI is testing how well LLMs implement scientific workflows in astronomy and visualize results.
2
4
7

Replies

@CosmicAI_Inst
CosmicAI
22 days
✅ Accepted as part of the NeurIPS Datasets and Benchmarks Track 2025! 🔎 Findings: Even the best LLMs struggle to execute scientific workflows.
1
0
0
@CosmicAI_Inst
CosmicAI
22 days
AstroVisBench was created by @sebajoed, @Murtazahusaintx, Stella Offner, @StephaJuneau, @paultorrey9, Adam Bolton, Juan P. Farias, Niall Gaffney, @gregd_nlp, and @jessyjli.
1
0
0
@CosmicAI_Inst
CosmicAI
22 days
@sebajoed @Murtazahusaintx @StephaJuneau @paultorrey9 @gregd_nlp @jessyjli AstroVisBench is the first scientific coding benchmark that evaluates whether models: -aid scientists amidst their own workflows when they do not know step-by-step workflows and may not know, in advance, the kinds of scientific utility a visualization would bring.
1
0
0
@CosmicAI_Inst
CosmicAI
22 days
@sebajoed @Murtazahusaintx @StephaJuneau @paultorrey9 @gregd_nlp @jessyjli -are adequate at long-tail knowledge, focusing especially on the usage of domain-specific APIs and visualization generation -interact with a variety of data formats to create diverse visualizations that comply with expert standards
1
0
0