Nolan Koblischke @astro_nolan X Profile

Nolan Koblischke

@astro_nolan

Followers

382

Following

2K

Media

54

Statuses

425

Language models and astrophysics. PhD student @UofT, formerly @UBC, @EPFL also researching @PolymathicAI

https://t.co/YYBW78K52Z

z = 0

Joined July 2015

Don't wanna be here? Send us removal request.

Nolan Koblischke

@astro_nolan

7 months

You've seen robots trained in simulated environments; what if we could do the same for AI scientists? In our ICML 2025 paper, we introduce GravityBench, a benchmark created to test AI's scientific capabilities through physics simulations. /n

1

2

14

Nolan Koblischke

@astro_nolan

4 days

Since plot reading is a necessary skill for scientific research, this is a big deal!

0

1

Nolan Koblischke

@astro_nolan

4 days

GPT-5.2 seems to have fantastic plot reading capabilities, even better than Gemini 3 Pro. I introduced this challenge almost exactly one year ago and it looks like it's basically solved! 🪩 Of course, I will have slightly harder plot reading tasks to share :)

Nolan Koblischke

@astro_nolan

27 days

Gemini 3.0 Pro still has difficulties reading simple plots. As shown by asking it to "pick 10 points that lie on this curve". They added a new `media_resolution` parameter which I set to high for this test.

1

0

6

Nolan Koblischke

@astro_nolan

14 days

Weekend hack: I tried RL finetuning on my "choose 10 points that lie on this curve" problem, which all the frontier models struggle at. It's a work-in-progress! Specifically I tuned Qwen2.5-VL-3B-Instruct with GRPOTrainer from HuggingFace's TRL. I found that if I only had a

Nolan Koblischke

@astro_nolan

20 days

Claude is still Claude-y, fantastic at coding / agentic tasks but mid at vision. As found in @EpochAIResearch @GregHBurnham https://t.co/yy28zYFzjd

1

0

4

Nolan Koblischke

@astro_nolan

20 days

Christine had the awesome idea of collecting 20 astrophysics papers and seeing whether LLMs could replicate the findings. What amazed me is how quickly she pulled together a team and made it happen! Now we have a solid benchmark for the community to put models to the test!

Christine Ye

@christinexye

20 days

Can frontier language model agents replicate astrophysics research papers? Clearly not yet -- but models are slowly getting better! Excited to finally put out ReplicationBench, the work of an awesome team of astrophysicists from across Stanford's KIPAC, SLAC, and C4DU.

0

1

6

Nolan Koblischke

@astro_nolan

20 days

Claude is still Claude-y, fantastic at coding / agentic tasks but mid at vision. As found in @EpochAIResearch @GregHBurnham https://t.co/yy28zYFzjd

Nolan Koblischke

@astro_nolan

27 days

Gemini 3.0 Pro still has difficulties reading simple plots. As shown by asking it to "pick 10 points that lie on this curve". They added a new `media_resolution` parameter which I set to high for this test.

1

0

4

Nolan Koblischke

@astro_nolan

27 days

Gemini 3.0 Pro still has difficulties reading simple plots. As shown by asking it to "pick 10 points that lie on this curve". They added a new `media_resolution` parameter which I set to high for this test.

Nolan Koblischke

@astro_nolan

4 months

GPT-5 (and all models I've tried) have difficulty reading plots. I generated the blue curve and asked the model to select 10 x,y points that lie on the curve shown in the image, which I've plotted in red.

0

2

Nolan Koblischke

@astro_nolan

3 months

I'm obsessed with vision capabilities of language models, totally underappreciated. I had a conversation with an employee at a lab who argued that improving vision does not help automate AI research, so it's not a focus. But I'd be much worse at research if I was blindfolded.

2

0

5

Nolan Koblischke

@astro_nolan

4 months

Also, NYC is a really fun city. Especially with intern friends who want to make the most out of the summer! had a great time.

0

2

Nolan Koblischke

@astro_nolan

4 months

It was such a great internship! So excited to share more soon 🌌🔭💻

Shirley Ho

@cosmo_shirley

4 months

The final days of summer are upon us, and it is bittersweet to say goodbye to our great group of @PolymathicAI interns! 😭 @JacopoTeneggi @cskokgibbs @astro_nolan @CristianaD2202 @LouisSerrano31 @rachelczhang Here are a few pics to remind us all the fun we had! (and hold your

1

0

5

Nolan Koblischke

@astro_nolan

4 months

Inspired by @kdqg1's suggestion, I ran this test 1000 times with GPT-4.1 and found it occasionally succeeds, proving the capability exists! This suggests RL fine-tuning could improve plot-reading abilities. Future work for my pet project 😄

0

2

Nolan Koblischke

@astro_nolan

4 months

GPT-5 (and all models I've tried) have difficulty reading plots. I generated the blue curve and asked the model to select 10 x,y points that lie on the curve shown in the image, which I've plotted in red.

2

0

7

Nolan Koblischke

@astro_nolan

4 months

What would you do with 10,000 experts on demand?

0

4

Nolan Koblischke

@astro_nolan

5 months

Thanks everyone who came to my poster @icmlconf. I'm so happy to feel the excitement about using physics simulations to test and train science agents.

0

2

15

Nolan Koblischke

@astro_nolan

5 months

If ICLR is “I Clear” why isn’t ICML “I Camel”? 🐪

0

8

Nolan Koblischke

@astro_nolan

5 months

I’ll be at ICML next week in Vancouver - hit me up if you’d like to chat about using LLMs in (astro)physics research! I’ll be presenting a poster on GravityBench Thursday July 17 4:30pm-7pm. East Exhibition Hall A-B #E-2504

Nolan Koblischke

@astro_nolan

7 months

You've seen robots trained in simulated environments; what if we could do the same for AI scientists? In our ICML 2025 paper, we introduce GravityBench, a benchmark created to test AI's scientific capabilities through physics simulations. /n

0

1

17

Nolan Koblischke

@astro_nolan

7 months

Looking ahead, we're excited to use simulations - where the ground truth is known by construction - to test if AI agents can recover physical parameters. Combining such simulations with RL training could be a major step towards AI Scientists that truly use the scientific method.

0

2

Nolan Koblischke

@astro_nolan

7 months

Interestingly, many AI models rush their conclusions, even o4-mini-high only uses 33 of their available 100 observations on average. They often make arbitrary assumptions (like assuming a star mass is 1 kg!) to move quickly through the problem.

1

Nolan Koblischke

@astro_nolan

7 months

The twist: each agent is restricted to just 100 observations, mimicking the real-world limitations scientists face. This tests not only their scientific reasoning and coding abilities but also their ability to strategically plan their observations.

1

0

Nolan Koblischke

@astro_nolan

7 months

We use binary star systems as our simulated environment, challenging AI agents with difficult tasks, such as determining how we've altered the force of gravity. Paper: https://t.co/1dhFY6ld1K Website: https://t.co/pcEkYTmnH7 Code: https://t.co/34TCMIw5RB

1

0