CosmicAI @CosmicAI_Inst tweet - Exciting news! Introducing AstroVisBench: A Code Benchmark for Scientific Computing and Visualization in Astronomy! A new benchmark developed by researchers CosmicAI is testing how well LLMs implement scientific workflows in astronomy and visualize results. https://t.co/0VfYodneZ8

CosmicAI

@CosmicAI_Inst

22 days

Exciting news! Introducing AstroVisBench: A Code Benchmark for Scientific Computing and Visualization in Astronomy! A new benchmark developed by researchers CosmicAI is testing how well LLMs implement scientific workflows in astronomy and visualize results.

Replies

CosmicAI

@CosmicAI_Inst

22 days

✅ Accepted as part of the NeurIPS Datasets and Benchmarks Track 2025! 🔎 Findings: Even the best LLMs struggle to execute scientific workflows.

CosmicAI

@CosmicAI_Inst

22 days

AstroVisBench was created by @sebajoed, @Murtazahusaintx, Stella Offner, @StephaJuneau, @paultorrey9, Adam Bolton, Juan P. Farias, Niall Gaffney, @gregd_nlp, and @jessyjli.

CosmicAI

@CosmicAI_Inst

22 days

@sebajoed @Murtazahusaintx @StephaJuneau @paultorrey9 @gregd_nlp @jessyjli AstroVisBench is the first scientific coding benchmark that evaluates whether models: -aid scientists amidst their own workflows when they do not know step-by-step workflows and may not know, in advance, the kinds of scientific utility a visualization would bring.

CosmicAI

@CosmicAI_Inst

22 days

@sebajoed @Murtazahusaintx @StephaJuneau @paultorrey9 @gregd_nlp @jessyjli -are adequate at long-tail knowledge, focusing especially on the usage of domain-specific APIs and visualization generation -interact with a variety of data formats to create diverse visualizations that comply with expert standards

CosmicAI

@CosmicAI_Inst

22 days

@sebajoed @Murtazahusaintx @StephaJuneau @paultorrey9 @gregd_nlp @jessyjli Learn more Website https://t.co/gahO3o61r5 Paper https://t.co/Ood6lR6oIB @nyuniversity @NSF @SimonsFdn @OdenInstitute @UVA @TACC @NOIRLabScience @SLAClab