Sam Stevens @ NeurIPS 2025
@iamsamstevens
Followers
331
Following
573
Media
19
Statuses
283
PhD focusing on AI-accelerated scientific discovery; seeking full-time research roles.
Joined August 2022
There are competing views on whether RL can genuinely improve base model's performance (e.g., pass@128). The answer is both yes and no, largely depending on the interplay between pre-training, mid-training, and RL. We trained a few hundreds of GPT-2 scale LMs on synthetic
28
218
1K
I’m at NeurIPS and giving a keynote at the @imageomics workshop on Saturday! If you want to learn more about my new lab’s research be sure to drop by, and don’t forget to stick around for all the great talks and posters throughout the workshop! https://t.co/5df1CCphmM
0
5
11
Jianyang is a fantastic collaborator. He led BioCLIP 2 to a huge success, and is also a great mentor to younger students and larger teams (FinerCAM, SST). 10/10 recommend working with/hiring Jianyang!
1
1
2
Going to #NeurIPS2025 San Diego? Escape the conference for a couple hours with a morning bird walk in the trails of Balboa Park. 7 am on Thurs, Dec. 4.
1
4
10
Life update: I moved to silicon valley to tackle agents' biggest challenges: plasticity and reliability. Today's agents are smart but brittle. They lack plasticity (continual learning and adaptation) and reliability (stable, predictable behavior with bounded failures). These two
40
43
421
BioCLIP 2 -> #neurips25 Spotlight! AC's comment restores my faith in peer-review: "I recommend this work for spotlight due to its potential impact in a relatively underexplored area. The work provides a large scale curated dataset, a trained embedding space, and extensive
📈 Scaling may be hitting a wall in the digital world, but it's only beginning in the biological world! We trained a foundation model on 214M images of ~1M species (50% of named species on Earth 🐨🐠🌻🦠) and found emergent properties capturing hidden regularities in nature. 🧵
4
27
178
Computer Use: Modern Moravec's Paradox A new blog post arguing why computer-use agents may be the biggest opportunity and challenge for AGI. https://t.co/6fZfTdx710 Table of Contents > Moravec’s Paradox > Moravec's Paradox in 2025 > Computer use may be the biggest opportunity
9
65
213
🧪 Chemists spend many hours planning and replanning synthetic routes for a target molecule to avoid dangerous reactants and intermediates.☠️🚫 🤔 What if an AI agent could plan around them automatically—better and faster than human experts? 🔬 Constrained retrosynthesis
4
17
49
🎉 Excited to share that our paper EvoSchema: Towards Text-to-SQL Robustness Against Schema Evolution was accepted at VLDB 2025! 🚀 📢 Reminder: join us at VLDB 2025 in London! 🗓️ Sept 2 (Tue), 10:45 AM – 12:15 PM 📍 Room Wordsworth 4F 📄 https://t.co/ZNAav4ZtoX
#VLDB2025 #LLMs
1
18
30
Welcome back, students! At @imageomics and the @ABCGlobalCenter, we’re kicking off the semester with big questions: How can AI help us better understand and protect life on Earth? #AIforNature #AIforGood @OSUengineering
0
2
2
🚀 Still have a chance to submit to @NeurIPSConf for our Multi-Turn Workshop! 🏆 Best Paper Awards 🎓 10-15 Registration Waivers for student authors 🎤 New panelist: @willccbb from @primeintellect! ⏳ Deadline is August 22—only 10 days left! 🎉 Thanks to our sponsor
2
14
82
🚀 Excited to share our #ACL2025 Findings paper: Explorer — a scalable pipeline that generates diverse web trajectories via exploration, powering generalist GUI agents with strong performance! 📄 https://t.co/KXSsdxXIFQ 🌐 https://t.co/PcYG8QsVqF
#WebAgents #SyntheticData #LLM
0
13
32
As AI agents start taking real actions online, how do we prevent unintended harm? We teamed up with @OhioState and @UCBerkeley to create WebGuard: the first dataset for evaluating web agent risks and building real-world safety guardrails for online environments. 🧵
6
22
85
Remember “Son of Anton” from the Silicon Valley show(@SiliconHBO)? The experimental AI that “efficiently” orders 4,000 lbs of meat while looking for a cheap burger and “fixes” a bug by deleting all the code? It’s starting to look a lot like reality. Even 18 months ago, my own
As AI agents start taking real actions online, how do we prevent unintended harm? We teamed up with @OhioState and @UCBerkeley to create WebGuard: the first dataset for evaluating web agent risks and building real-world safety guardrails for online environments. 🧵
0
30
68
I'm excited to bring the Imageomics workshop to NeurIPS 2025! Consider submitting your work on ai4ecology, ai4conservation and general ai4science--if you're using images to learn something about the natural world, chances are it's a good fit for the imageomics workshop!
Announcing the @NeurIPSConf 2025 workshop on Imageomics: Discovering Biological Knowledge from Images Using AI! The workshop focuses on the interdisciplinary field between machine learning and biological science. We look forward to seeing you in San Diego! #NeurIPS2025
0
3
6
🔎Agentic search like Deep Research is fundamentally changing web search, but it also brings an evaluation crisis⚠️ Introducing Mind2Web 2: Evaluating Agentic Search with Agents-as-a-Judge - 130 tasks (each requiring avg. 100+ webpages) from 1,000+ hours of expert labor -
3
52
225
Are you at #CVPR2025? RoboSpatial Oral is today! 📅 June 14 (Sat) | 🕐 1:00 PM | 📍Oral Session 4B @ ExHall A2
🔥 VLMs aren’t built for spatial reasoning — yet. They hallucinate free space. Misjudge object fit. Can’t tell below from behind We built RoboSpatial to tackle that — a dataset for teaching spatial understanding to 2D/3D VLMs for robotics. 📝 Perfect review scores @CVPR 2025
0
8
18
📢 Imageomics showcases Biodiversity + AI at #CVPR2025! 🔬 Jenna Kline presents MMLA 🎤 Jianyang Gu on static segmentation + ViT explainability 📊 Ankit Upadhyay on animal re-ID 🐟 Fish-Vista dataset for aquatic species @ICICLE_AI #AI4Science Read more:
imageomics.osu.edu
0
1
7
Heading to #CVPR2025 to present our Oral paper with @NVIDIARobotics! 📅 June 14 (Sat) | 🕐 1:00 PM | 📍Oral Session 4B @ ExHall A2 I’ll also be at the 3D-VLA/VLM and EVAL-FoMo 2 workshops presenting the same work. Come say hi!
🔥 VLMs aren’t built for spatial reasoning — yet. They hallucinate free space. Misjudge object fit. Can’t tell below from behind We built RoboSpatial to tackle that — a dataset for teaching spatial understanding to 2D/3D VLMs for robotics. 📝 Perfect review scores @CVPR 2025
3
5
28