Neel Rajani
@NeelRajani_
Followers
207
Following
10K
Media
23
Statuses
275
PhD student in responsible NLP @InfAtEd. Passionate about LLM interpretability and alignment
Edinburgh, Scotland
Joined May 2013
🚨New paper alert!🚨 "Scalpel vs. Hammer: GRPO Amplifies Existing Capabilities, SFT Replaces Them" @ActInterp ICML'25 @deepseek_ai popularised RLVR and distillation for 'reasoning training'! But how do they differ under the hood? Details in 🧵: (1/8)
2
22
45
Check out our “Learning GUI Grounding with Spatial Reasoning from Visual Feedback”! We reframe GUI grounding as an interactive search task by learning to move a virtual cursor via RL and using visual feedback! Massive improvements on ScreenSpot-v2: (+5.7%) and -Pro (+110.8%)!
2
13
16
This 2024 Google paper on improving Adam via natural gradients/FIM mentions in its acknowledgements that it took inspiration from Math YouTube videos. Finally something relatable! https://t.co/wmir7Vxu1l
0
0
4
How shameful. All the engineers paid 6+ figures to create such a product should seriously question why instead of using their highly specialised skills to make the world a better place, they are literally building the infinite slop machine (!)
0
0
5
🚨 Before Sam puts personalized ads in your AI chats… Take our 5 min survey & discover what LLMs actually know about you! 🤖💡 Your responses will help build better AI privacy safeguards.
1
1
4
Extremely impressive paper by @ospanbatyr, still can't believe this works!
Multimodal models typically need millions of examples from each modality paired with text for training. With SEMI 🌓, we integrate new low-resource modalities into LLMs with as few as 32 samples — including satellite images, galaxies, sensors, and molecules. (1/6)
0
0
8
I've been awarded a Starting Grant from @ERC_Research! As part of AToM-FM ⚛️, I'll study efficient architectures for foundation models with end-to-end tokenisation and adaptive+permanent memory Building a greener, more democratic AI
📣 The ERC Starting Grant call results are out! Find out which early-career researchers will receive funding, what they will be investigating, where they will be based... plus lots of other #ERCStG facts & figures for 2025! ➡️ https://t.co/cGctMhcJos 🇪🇺 #HorizonEurope
14
18
142
We introduce PiCSAR (Probabilistic Confidence Selection And Ranking)💡: A simple training-free method for scoring samples based on probabilistic confidence, selecting a reasoning chain with the highest confidence from multiple sampled responses. ✏️PiCSAR is generalisable across
2
31
94
Our method for achieving more faithful, verifiable and robust #LLM reasoning (FLARE 💫) has been accepted at #EMNLP2025 @emnlpmeeting ! Be sure to check out: https://t.co/cSHn97iLVJ Work done with the amazing @PMinervini @PSH_Lewis
@pat_verga @IAugenstein
arxiv.org
Modern Question Answering (QA) and Reasoning approaches based on Large Language Models (LLMs) commonly use prompting techniques, such as Chain-of-Thought (CoT), assuming the resulting generation...
👋Psst! Want more faithful, verifiable and robust #LLM reasoning than with CoT, but using external solvers is meh? Our FLARE💫uses Logic Programming with Exhaustive Simulated Search to achieve this.🧵 With @PMinervini @PSH_Lewis @pat_verga @IAugenstein
https://t.co/cSHn97iLVJ
0
8
29
Just realised I got my first citation :) really excited about this as an academic milestone
0
0
19
Sort of eerie when models generate gibberish but other parts of the completion recognise that
1
0
6
for the @GoodfireAI hackathon, i built a tool to visualize which experts activate the most in gpt-oss! found that certain experts tend to fire in interpretable contexts, like in business, poems, and code
4
5
55
1
0
3
Recipe is basically the same as that in recent work by @anna_soligo and @EdTurner42
New papers with @EdTurner42 @sen_r @NeelNanda5! We find an 'evil vector' which can both induce and ablate misaligned behaviour 👿. We also open source a bunch of datasets and models to accelerate research on emergent misalignment.
1
0
3
Yesterday, OpenAI released an open LLM for the first time since 2019! They co-released a paper on malicious fine-tuning to see if adversaries could fine-tune gpt-oss for biorisk/cyber capabilities. Can we also induce emergent misalignment as in previous studies? Indeed, we can:
1
2
13
@saagnikkk @saagnikkk Just realised I never cited your work, my apologies! Have submitted a revision that includes a citation to arxiv and it's in the queue to be announced.
0
0
1
Super timely work by @aryopg on failure cases of reasoning models, make sure to check it out!
New Anthropic Research: “Inverse Scaling in Test-Time Compute” We found cases where longer reasoning leads to lower accuracy. Our findings suggest that naïve scaling of test-time compute may inadvertently reinforce problematic reasoning patterns. 🧵
0
1
5
This is a super nice read by @ZeroyuHuang !
🚀 Introducing Prefix-RFT to blend SFT and RFT! SFT can learn more complex problems by mimicking, but can have poor generalization. RFT has better overall performance but is limited by the initial policy. Our method, Prefix-RFT, makes the best of both worlds!
0
0
7
Transformers struggle with length generalization and long context. What can we do about it? Our new #TMLR paper with @rolandalong , @paul_smolensky and @JianfengGao0217 shows how to handle the issue. Using a new attention mechanism called TRA. Curious? Read the 🧵 for more 🤓
1
8
12
@aryopg @seraphinagt @iatitov @PMinervini @edwardbeeching (that is, on Saturday at the @ActInterp workshop). Finally, here's the arXiv link:
arxiv.org
Training large language models (LLMs) for reasoning via maths and code datasets has become a major new focus in LLM post-training. Two particularly popular approaches are reinforcement learning...
0
0
4