astro_nolan Profile Banner
Nolan Koblischke Profile
Nolan Koblischke

@astro_nolan

Followers
382
Following
2K
Media
54
Statuses
425

Language models and astrophysics. PhD student @UofT, formerly @UBC, @EPFL also researching @PolymathicAI

z = 0
Joined July 2015
Don't wanna be here? Send us removal request.
@astro_nolan
Nolan Koblischke
7 months
You've seen robots trained in simulated environments; what if we could do the same for AI scientists? In our ICML 2025 paper, we introduce GravityBench, a benchmark created to test AI's scientific capabilities through physics simulations. /n
1
2
14
@astro_nolan
Nolan Koblischke
4 days
Since plot reading is a necessary skill for scientific research, this is a big deal!
0
0
1
@astro_nolan
Nolan Koblischke
4 days
GPT-5.2 seems to have fantastic plot reading capabilities, even better than Gemini 3 Pro. I introduced this challenge almost exactly one year ago and it looks like it's basically solved! 🪩 Of course, I will have slightly harder plot reading tasks to share :)
@astro_nolan
Nolan Koblischke
27 days
Gemini 3.0 Pro still has difficulties reading simple plots. As shown by asking it to "pick 10 points that lie on this curve". They added a new `media_resolution` parameter which I set to high for this test.
1
0
6
@astro_nolan
Nolan Koblischke
14 days
Weekend hack: I tried RL finetuning on my "choose 10 points that lie on this curve" problem, which all the frontier models struggle at. It's a work-in-progress! Specifically I tuned Qwen2.5-VL-3B-Instruct with GRPOTrainer from HuggingFace's TRL. I found that if I only had a
@astro_nolan
Nolan Koblischke
20 days
Claude is still Claude-y, fantastic at coding / agentic tasks but mid at vision. As found in @EpochAIResearch @GregHBurnham https://t.co/yy28zYFzjd
1
0
4
@astro_nolan
Nolan Koblischke
20 days
Christine had the awesome idea of collecting 20 astrophysics papers and seeing whether LLMs could replicate the findings. What amazed me is how quickly she pulled together a team and made it happen! Now we have a solid benchmark for the community to put models to the test!
@christinexye
Christine Ye
20 days
Can frontier language model agents replicate astrophysics research papers? Clearly not yet -- but models are slowly getting better! Excited to finally put out ReplicationBench, the work of an awesome team of astrophysicists from across Stanford's KIPAC, SLAC, and C4DU.
0
1
6
@astro_nolan
Nolan Koblischke
20 days
Claude is still Claude-y, fantastic at coding / agentic tasks but mid at vision. As found in @EpochAIResearch @GregHBurnham https://t.co/yy28zYFzjd
@astro_nolan
Nolan Koblischke
27 days
Gemini 3.0 Pro still has difficulties reading simple plots. As shown by asking it to "pick 10 points that lie on this curve". They added a new `media_resolution` parameter which I set to high for this test.
1
0
4
@astro_nolan
Nolan Koblischke
27 days
Gemini 3.0 Pro still has difficulties reading simple plots. As shown by asking it to "pick 10 points that lie on this curve". They added a new `media_resolution` parameter which I set to high for this test.
@astro_nolan
Nolan Koblischke
4 months
GPT-5 (and all models I've tried) have difficulty reading plots. I generated the blue curve and asked the model to select 10 x,y points that lie on the curve shown in the image, which I've plotted in red.
0
0
2
@astro_nolan
Nolan Koblischke
3 months
I'm obsessed with vision capabilities of language models, totally underappreciated. I had a conversation with an employee at a lab who argued that improving vision does not help automate AI research, so it's not a focus. But I'd be much worse at research if I was blindfolded.
2
0
5
@astro_nolan
Nolan Koblischke
4 months
Also, NYC is a really fun city. Especially with intern friends who want to make the most out of the summer! had a great time.
0
0
2
@astro_nolan
Nolan Koblischke
4 months
It was such a great internship! So excited to share more soon 🌌🔭💻
@cosmo_shirley
Shirley Ho
4 months
The final days of summer are upon us, and it is bittersweet to say goodbye to our great group of @PolymathicAI interns! 😭 @JacopoTeneggi @cskokgibbs @astro_nolan @CristianaD2202 @LouisSerrano31 @rachelczhang Here are a few pics to remind us all the fun we had! (and hold your
1
0
5
@astro_nolan
Nolan Koblischke
4 months
Inspired by @kdqg1's suggestion, I ran this test 1000 times with GPT-4.1 and found it occasionally succeeds, proving the capability exists! This suggests RL fine-tuning could improve plot-reading abilities. Future work for my pet project 😄
0
0
2
@astro_nolan
Nolan Koblischke
4 months
GPT-5 (and all models I've tried) have difficulty reading plots. I generated the blue curve and asked the model to select 10 x,y points that lie on the curve shown in the image, which I've plotted in red.
2
0
7
@astro_nolan
Nolan Koblischke
4 months
What would you do with 10,000 experts on demand?
0
0
4
@astro_nolan
Nolan Koblischke
5 months
Thanks everyone who came to my poster @icmlconf. I'm so happy to feel the excitement about using physics simulations to test and train science agents.
0
2
15
@astro_nolan
Nolan Koblischke
5 months
If ICLR is “I Clear” why isn’t ICML “I Camel”? 🐪
0
0
8
@astro_nolan
Nolan Koblischke
5 months
I’ll be at ICML next week in Vancouver - hit me up if you’d like to chat about using LLMs in (astro)physics research! I’ll be presenting a poster on GravityBench Thursday July 17 4:30pm-7pm. East Exhibition Hall A-B #E-2504
@astro_nolan
Nolan Koblischke
7 months
You've seen robots trained in simulated environments; what if we could do the same for AI scientists? In our ICML 2025 paper, we introduce GravityBench, a benchmark created to test AI's scientific capabilities through physics simulations. /n
0
1
17
@astro_nolan
Nolan Koblischke
7 months
Looking ahead, we're excited to use simulations - where the ground truth is known by construction - to test if AI agents can recover physical parameters. Combining such simulations with RL training could be a major step towards AI Scientists that truly use the scientific method.
0
0
2
@astro_nolan
Nolan Koblischke
7 months
Interestingly, many AI models rush their conclusions, even o4-mini-high only uses 33 of their available 100 observations on average. They often make arbitrary assumptions (like assuming a star mass is 1 kg!) to move quickly through the problem.
1
1
1
@astro_nolan
Nolan Koblischke
7 months
The twist: each agent is restricted to just 100 observations, mimicking the real-world limitations scientists face. This tests not only their scientific reasoning and coding abilities but also their ability to strategically plan their observations.
1
0
0
@astro_nolan
Nolan Koblischke
7 months
We use binary star systems as our simulated environment, challenging AI agents with difficult tasks, such as determining how we've altered the force of gravity. Paper: https://t.co/1dhFY6ld1K Website: https://t.co/pcEkYTmnH7 Code: https://t.co/34TCMIw5RB
1
0
0