Parshin Shojaee ✈️ NeurIPS @ParshinShojaee X Profile

Parshin Shojaee ✈️ NeurIPS

@ParshinShojaee

Followers

3K

Following

3K

Media

27

Statuses

330

PhD student @VT_CS | AI for Science, Math, Code, Reasoning | Intern @Apple | prev @Adobe

https://t.co/HN8cSZmzmU

Arlington, VA

Joined January 2020

Don't wanna be here? Send us removal request.

Parshin Shojaee ✈️ NeurIPS

@ParshinShojaee

8 months

Scientific discovery with LLMs has so much potential yet is underexplored. Our new benchmark **LLM-SRBench** enable rigorous evaluations of equation discovery with LLMs! 🧠Key takeaway: Even SOTA discovery models with strong LLM backbones still fail to discover mathematical

4

33

207

Parshin Shojaee ✈️ NeurIPS

@ParshinShojaee

9 days

I’ll be at #NeurIPS in san diego next week (Dec1-7)! Would love to meet and chat. Hit me up if you want to talk about reasoning, openendedness, scientific discovery, or anything else!!

4

3

73

Eric Bigelow

@EricBigelow

28 days

📝 New paper! Two strategies have emerged for controlling LLM behavior at inference time: in-context learning (ICL; i.e. prompting) and activation steering. We propose that both can be understood as altering model beliefs, formally in the sense of Bayesian belief updating. 1/9

9

22

133

Sinead Williamson

@sineadwilliamso

1 month

📢 We’re looking for a researcher in in cogsci, neuroscience, linguistics, or related disciplines to work with us at Apple Machine Learning Research! We're hiring for a one-year interdisciplinary AIML Resident to work on understanding reasoning and decision making in LLMs. 🧵

9

57

310

Goodfire

@GoodfireAI

1 month

LLMs memorize a lot of training data, but memorization is poorly understood. Where does it live inside models? How is it stored? How much is it involved in different tasks? @jack_merullo_ & @srihita_raju's new paper examines all of these questions using loss curvature! (1/7)

10

134

819

Lisa Dunlap @NeurIPS

@lisabdunlap

1 month

So is the formula to just name the most famous institutions and call it an X paper? Neither the first or last author are from Anthropic or Stanford. I get that reputation matters for publicity but it does seem a little disrespectful

Rohan Paul

@rohanpaul_ai

1 month

New Stanford+Anthropic paper shows long step-by-step prompts can break model safety and trigger harmful answers. 😟 Long reasoning can quietly neutralize safety checks that people assume are working. The trick adds a benign puzzle and long reasoning before the harmful ask, plus

18

25

437

julius tarng cyber inspector

@tarngerine

2 months

What happens when you turn a designer into an interpretability researcher? They spend hours staring at feature activations in SVG code to see if LLMs actually understand SVGs. It turns out – yes~ We found that semantic concepts transfer across text, ASCII, and SVG:

14

96

750

Jessy Lin

@realJessyLin

2 months

As part of our recent work on memory layer architectures, I wrote up some of my thoughts on the continual learning problem broadly: Blog post: https://t.co/HNLqfNsQfN Some of the exposition goes beyond mem layers, so I thought it'd be useful to highlight separately:

25

174

1K

Eran Malach

@EranMalach

2 months

SSMs promised efficient language modeling for long context, but so far seem to underperform compared to Transformers in many settings. Our new work suggests that this is not a problem with SSMs, but with how we are currently using them. Arxiv: https://t.co/bCzxawF452 🧵

6

84

414

Mehrdad Farajtabar

@MFarajtabar

2 months

Join our innovative team at #Apple as a Research Scientist/Engineer specializing in LLM #Reasoning, #Planning, and General #Intelligence. We are seeking an ideal candidate who: - Is available to start by the end of this year - Holds a PhD or will graduate by year-end - Has 3-5

lnkd.in

This link will take you to a page that’s not on LinkedIn

9

31

257

Aayush Karan

@aakaran31

2 months

We found a new way to get language models to reason. 🤯 No RL, no training, no verifiers, no prompting. ❌ With better sampling, base models can achieve single-shot reasoning on par with (or better than!) GRPO while avoiding its characteristic loss in generation diversity.

73

248

2K

Parshin Shojaee ✈️ NeurIPS

@ParshinShojaee

2 months

Happy to be recognized as a top reviewer for #neurips2025 ! 🎉

18

14

549

Sara Hooker

@sarahookr

2 months

Adaptive but not intelligent. Drone footage from Lior Patel.

34

38

661

Shijie Xia

@ShijieX60925

2 months

🔥 Announcing our new paper: "SR-Scientist: Scientific Equation Discovery With Agentic AI" Most current work using LLMs for scientific discovery, like AlphaEvolve, follows a rigid "generate → evaluate → refine" loop. We challenge this paradigm for equation discovery. Our

5

30

100

Taylor Sorensen

@ma_tay_

2 months

🤖➡️📉 Post-training made LLMs better at chat and reasoning—but worse at distributional alignment, diversity, and sometimes even steering(!) We measure this with our new resource (Spectrum Suite) and introduce Spectrum Tuning (method) to bring them back into our models! 🌈 1/🧵

5

49

194

Chandan Reddy

@chandankreddy

2 months

🎤 I’ll be at #COLM2025 presenting “Quantifying Fairness in LLMs Beyond Tokens: A Semantic & Statistical Perspective” ✨ Oral Spotlight (24/1305 submissions) 📅 Wed, Oct 8 | 🕞 3:45 PM | Poster #44 (4:30 PM) 📄 https://t.co/J8Xo1jz1qu 🤝 Feel free to stop by or ping me !!

0

5

11

Yulu Gan

@yule_gan

2 months

Reinforcement Learning (RL) has long been the dominant method for fine-tuning, powering many state-of-the-art LLMs. Methods like PPO and GRPO explore in action space. But can we instead explore directly in parameter space? YES we can. We propose a scalable framework for

90

389

3K

Sakana AI

@SakanaAILabs

3 months

We’re excited to introduce ShinkaEvolve: An open-source framework that evolves programs for scientific discovery with unprecedented sample-efficiency. Blog: https://t.co/Bj32AGXC3T Code: https://t.co/UMCSQaeOhd Like AlphaEvolve and its variants, our framework leverages LLMs to

30

248

1K

Nathan Lambert

@natolambert

3 months

Thinking, Searching, and Acting A reflection on reasoning models. It's easy to fixate on the "thinking" that gave reasoning models their name, but just over a year out from o1-preview's release by OpenAI, the core primitives that make up models today has expanded. Searching and

Interconnects

@interconnectsai

3 months

Thinking, Searching, and Acting A reflection on reasoning models. https://t.co/GHx1AOWTfe

8

57

354

Parshin Shojaee ✈️ NeurIPS

@ParshinShojaee

3 months

Our paper on the reasoning illusion asked important questions about current evaluation paradigm of reasoning models and how they behave with respect to complexity. We hoped that our findings can help to look beyond benchmarks to better understand logical scaling & behavior of

arxiv.org

Recent generations of language models have introduced Large Reasoning Models (LRMs) that generate detailed thinking processes before providing answers. While these models demonstrate improved...

Mehrdad Farajtabar

@MFarajtabar

6 months

🧵 1/8 The Illusion of Thinking: Are reasoning models like o1/o3, DeepSeek-R1, and Claude 3.7 Sonnet really "thinking"? 🤔 Or are they just throwing more compute towards pattern matching? The new Large Reasoning Models (LRMs) show promising gains on math and coding benchmarks,

5

11

113

Yi Ma

@YiMaTweets

3 months

Today I was asked by a university reporter to say a few words to new students how to maintain a competitive edge against future AI technologies. I said "stay Curious, Critical, and Creative". As far as I know, these three C's are what the current technologies are still lacking.

7

15

88