Parshin Shojaee ✈️ NeurIPS
@ParshinShojaee
Followers
3K
Following
3K
Media
27
Statuses
330
PhD student @VT_CS | AI for Science, Math, Code, Reasoning | Intern @Apple | prev @Adobe
Arlington, VA
Joined January 2020
Scientific discovery with LLMs has so much potential yet is underexplored. Our new benchmark **LLM-SRBench** enable rigorous evaluations of equation discovery with LLMs! 🧠Key takeaway: Even SOTA discovery models with strong LLM backbones still fail to discover mathematical
4
33
207
I’ll be at #NeurIPS in san diego next week (Dec1-7)! Would love to meet and chat. Hit me up if you want to talk about reasoning, openendedness, scientific discovery, or anything else!!
4
3
73
📝 New paper! Two strategies have emerged for controlling LLM behavior at inference time: in-context learning (ICL; i.e. prompting) and activation steering. We propose that both can be understood as altering model beliefs, formally in the sense of Bayesian belief updating. 1/9
9
22
133
📢 We’re looking for a researcher in in cogsci, neuroscience, linguistics, or related disciplines to work with us at Apple Machine Learning Research! We're hiring for a one-year interdisciplinary AIML Resident to work on understanding reasoning and decision making in LLMs. 🧵
9
57
310
LLMs memorize a lot of training data, but memorization is poorly understood. Where does it live inside models? How is it stored? How much is it involved in different tasks? @jack_merullo_ & @srihita_raju's new paper examines all of these questions using loss curvature! (1/7)
10
134
819
So is the formula to just name the most famous institutions and call it an X paper? Neither the first or last author are from Anthropic or Stanford. I get that reputation matters for publicity but it does seem a little disrespectful
New Stanford+Anthropic paper shows long step-by-step prompts can break model safety and trigger harmful answers. 😟 Long reasoning can quietly neutralize safety checks that people assume are working. The trick adds a benign puzzle and long reasoning before the harmful ask, plus
18
25
437
What happens when you turn a designer into an interpretability researcher? They spend hours staring at feature activations in SVG code to see if LLMs actually understand SVGs. It turns out – yes~ We found that semantic concepts transfer across text, ASCII, and SVG:
14
96
750
As part of our recent work on memory layer architectures, I wrote up some of my thoughts on the continual learning problem broadly: Blog post: https://t.co/HNLqfNsQfN Some of the exposition goes beyond mem layers, so I thought it'd be useful to highlight separately:
25
174
1K
SSMs promised efficient language modeling for long context, but so far seem to underperform compared to Transformers in many settings. Our new work suggests that this is not a problem with SSMs, but with how we are currently using them. Arxiv: https://t.co/bCzxawF452 🧵
6
84
414
Join our innovative team at #Apple as a Research Scientist/Engineer specializing in LLM #Reasoning, #Planning, and General #Intelligence. We are seeking an ideal candidate who: - Is available to start by the end of this year - Holds a PhD or will graduate by year-end - Has 3-5
lnkd.in
This link will take you to a page that’s not on LinkedIn
9
31
257
We found a new way to get language models to reason. 🤯 No RL, no training, no verifiers, no prompting. ❌ With better sampling, base models can achieve single-shot reasoning on par with (or better than!) GRPO while avoiding its characteristic loss in generation diversity.
73
248
2K
Adaptive but not intelligent. Drone footage from Lior Patel.
34
38
661
🔥 Announcing our new paper: "SR-Scientist: Scientific Equation Discovery With Agentic AI" Most current work using LLMs for scientific discovery, like AlphaEvolve, follows a rigid "generate → evaluate → refine" loop. We challenge this paradigm for equation discovery. Our
5
30
100
🤖➡️📉 Post-training made LLMs better at chat and reasoning—but worse at distributional alignment, diversity, and sometimes even steering(!) We measure this with our new resource (Spectrum Suite) and introduce Spectrum Tuning (method) to bring them back into our models! 🌈 1/🧵
5
49
194
🎤 I’ll be at #COLM2025 presenting “Quantifying Fairness in LLMs Beyond Tokens: A Semantic & Statistical Perspective” ✨ Oral Spotlight (24/1305 submissions) 📅 Wed, Oct 8 | 🕞 3:45 PM | Poster #44 (4:30 PM) 📄 https://t.co/J8Xo1jz1qu 🤝 Feel free to stop by or ping me !!
0
5
11
Reinforcement Learning (RL) has long been the dominant method for fine-tuning, powering many state-of-the-art LLMs. Methods like PPO and GRPO explore in action space. But can we instead explore directly in parameter space? YES we can. We propose a scalable framework for
90
389
3K
We’re excited to introduce ShinkaEvolve: An open-source framework that evolves programs for scientific discovery with unprecedented sample-efficiency. Blog: https://t.co/Bj32AGXC3T Code: https://t.co/UMCSQaeOhd Like AlphaEvolve and its variants, our framework leverages LLMs to
30
248
1K
Thinking, Searching, and Acting A reflection on reasoning models. It's easy to fixate on the "thinking" that gave reasoning models their name, but just over a year out from o1-preview's release by OpenAI, the core primitives that make up models today has expanded. Searching and
Thinking, Searching, and Acting A reflection on reasoning models. https://t.co/GHx1AOWTfe
8
57
354
Our paper on the reasoning illusion asked important questions about current evaluation paradigm of reasoning models and how they behave with respect to complexity. We hoped that our findings can help to look beyond benchmarks to better understand logical scaling & behavior of
arxiv.org
Recent generations of language models have introduced Large Reasoning Models (LRMs) that generate detailed thinking processes before providing answers. While these models demonstrate improved...
🧵 1/8 The Illusion of Thinking: Are reasoning models like o1/o3, DeepSeek-R1, and Claude 3.7 Sonnet really "thinking"? 🤔 Or are they just throwing more compute towards pattern matching? The new Large Reasoning Models (LRMs) show promising gains on math and coding benchmarks,
5
11
113
Today I was asked by a university reporter to say a few words to new students how to maintain a competitive edge against future AI technologies. I said "stay Curious, Critical, and Creative". As far as I know, these three C's are what the current technologies are still lacking.
7
15
88