ParshinShojaee Profile Banner
Parshin Shojaee ✈️ NeurIPS Profile
Parshin Shojaee ✈️ NeurIPS

@ParshinShojaee

Followers
3K
Following
3K
Media
27
Statuses
330

PhD student @VT_CS | AI for Science, Math, Code, Reasoning | Intern @Apple | prev @Adobe

Arlington, VA
Joined January 2020
Don't wanna be here? Send us removal request.
@ParshinShojaee
Parshin Shojaee ✈️ NeurIPS
8 months
Scientific discovery with LLMs has so much potential yet is underexplored. Our new benchmark **LLM-SRBench** enable rigorous evaluations of equation discovery with LLMs! 🧠Key takeaway: Even SOTA discovery models with strong LLM backbones still fail to discover mathematical
4
33
207
@ParshinShojaee
Parshin Shojaee ✈️ NeurIPS
9 days
I’ll be at #NeurIPS in san diego next week (Dec1-7)! Would love to meet and chat. Hit me up if you want to talk about reasoning, openendedness, scientific discovery, or anything else!!
4
3
73
@EricBigelow
Eric Bigelow
28 days
📝 New paper! Two strategies have emerged for controlling LLM behavior at inference time: in-context learning (ICL; i.e. prompting) and activation steering. We propose that both can be understood as altering model beliefs, formally in the sense of Bayesian belief updating. 1/9
9
22
133
@sineadwilliamso
Sinead Williamson
1 month
📢 We’re looking for a researcher in in cogsci, neuroscience, linguistics, or related disciplines to work with us at Apple Machine Learning Research! We're hiring for a one-year interdisciplinary AIML Resident to work on understanding reasoning and decision making in LLMs. 🧵
9
57
310
@GoodfireAI
Goodfire
1 month
LLMs memorize a lot of training data, but memorization is poorly understood. Where does it live inside models? How is it stored? How much is it involved in different tasks? @jack_merullo_ & @srihita_raju's new paper examines all of these questions using loss curvature! (1/7)
10
134
819
@lisabdunlap
Lisa Dunlap @NeurIPS
1 month
So is the formula to just name the most famous institutions and call it an X paper? Neither the first or last author are from Anthropic or Stanford. I get that reputation matters for publicity but it does seem a little disrespectful
@rohanpaul_ai
Rohan Paul
1 month
New Stanford+Anthropic paper shows long step-by-step prompts can break model safety and trigger harmful answers. 😟 Long reasoning can quietly neutralize safety checks that people assume are working. The trick adds a benign puzzle and long reasoning before the harmful ask, plus
18
25
437
@tarngerine
julius tarng cyber inspector
2 months
What happens when you turn a designer into an interpretability researcher? They spend hours staring at feature activations in SVG code to see if LLMs actually understand SVGs. It turns out – yes~ We found that semantic concepts transfer across text, ASCII, and SVG:
14
96
750
@realJessyLin
Jessy Lin
2 months
As part of our recent work on memory layer architectures, I wrote up some of my thoughts on the continual learning problem broadly: Blog post: https://t.co/HNLqfNsQfN Some of the exposition goes beyond mem layers, so I thought it'd be useful to highlight separately:
25
174
1K
@EranMalach
Eran Malach
2 months
SSMs promised efficient language modeling for long context, but so far seem to underperform compared to Transformers in many settings. Our new work suggests that this is not a problem with SSMs, but with how we are currently using them. Arxiv: https://t.co/bCzxawF452 🧵
6
84
414
@MFarajtabar
Mehrdad Farajtabar
2 months
Join our innovative team at #Apple as a Research Scientist/Engineer specializing in LLM #Reasoning, #Planning, and General #Intelligence. We are seeking an ideal candidate who: - Is available to start by the end of this year - Holds a PhD or will graduate by year-end - Has 3-5
lnkd.in
This link will take you to a page that’s not on LinkedIn
9
31
257
@aakaran31
Aayush Karan
2 months
We found a new way to get language models to reason. 🤯 No RL, no training, no verifiers, no prompting. ❌ With better sampling, base models can achieve single-shot reasoning on par with (or better than!) GRPO while avoiding its characteristic loss in generation diversity.
73
248
2K
@ParshinShojaee
Parshin Shojaee ✈️ NeurIPS
2 months
Happy to be recognized as a top reviewer for #neurips2025 ! 🎉
18
14
549
@sarahookr
Sara Hooker
2 months
Adaptive but not intelligent. Drone footage from Lior Patel.
34
38
661
@ShijieX60925
Shijie Xia
2 months
🔥 Announcing our new paper: "SR-Scientist: Scientific Equation Discovery With Agentic AI" Most current work using LLMs for scientific discovery, like AlphaEvolve, follows a rigid "generate → evaluate → refine" loop. We challenge this paradigm for equation discovery. Our
5
30
100
@ma_tay_
Taylor Sorensen
2 months
🤖➡️📉 Post-training made LLMs better at chat and reasoning—but worse at distributional alignment, diversity, and sometimes even steering(!) We measure this with our new resource (Spectrum Suite) and introduce Spectrum Tuning (method) to bring them back into our models! 🌈 1/🧵
5
49
194
@chandankreddy
Chandan Reddy
2 months
🎤 I’ll be at #COLM2025 presenting “Quantifying Fairness in LLMs Beyond Tokens: A Semantic & Statistical Perspective” ✨ Oral Spotlight (24/1305 submissions) 📅 Wed, Oct 8 | 🕞 3:45 PM | Poster #44 (4:30 PM) 📄 https://t.co/J8Xo1jz1qu 🤝 Feel free to stop by or ping me !!
0
5
11
@yule_gan
Yulu Gan
2 months
Reinforcement Learning (RL) has long been the dominant method for fine-tuning, powering many state-of-the-art LLMs. Methods like PPO and GRPO explore in action space. But can we instead explore directly in parameter space? YES we can. We propose a scalable framework for
90
389
3K
@SakanaAILabs
Sakana AI
3 months
We’re excited to introduce ShinkaEvolve: An open-source framework that evolves programs for scientific discovery with unprecedented sample-efficiency. Blog: https://t.co/Bj32AGXC3T Code: https://t.co/UMCSQaeOhd Like AlphaEvolve and its variants, our framework leverages LLMs to
30
248
1K
@natolambert
Nathan Lambert
3 months
Thinking, Searching, and Acting A reflection on reasoning models. It's easy to fixate on the "thinking" that gave reasoning models their name, but just over a year out from o1-preview's release by OpenAI, the core primitives that make up models today has expanded. Searching and
@interconnectsai
Interconnects
3 months
Thinking, Searching, and Acting A reflection on reasoning models. https://t.co/GHx1AOWTfe
8
57
354
@ParshinShojaee
Parshin Shojaee ✈️ NeurIPS
3 months
Our paper on the reasoning illusion asked important questions about current evaluation paradigm of reasoning models and how they behave with respect to complexity. We hoped that our findings can help to look beyond benchmarks to better understand logical scaling & behavior of
Tweet card summary image
arxiv.org
Recent generations of language models have introduced Large Reasoning Models (LRMs) that generate detailed thinking processes before providing answers. While these models demonstrate improved...
@MFarajtabar
Mehrdad Farajtabar
6 months
🧵 1/8 The Illusion of Thinking: Are reasoning models like o1/o3, DeepSeek-R1, and Claude 3.7 Sonnet really "thinking"? 🤔 Or are they just throwing more compute towards pattern matching? The new Large Reasoning Models (LRMs) show promising gains on math and coding benchmarks,
5
11
113
@YiMaTweets
Yi Ma
3 months
Today I was asked by a university reporter to say a few words to new students how to maintain a competitive edge against future AI technologies. I said "stay Curious, Critical, and Creative". As far as I know, these three C's are what the current technologies are still lacking.
7
15
88