 
            
              EdinburghNLP
            
            @EdinburghNLP
Followers
                13K
              Following
                827
              Media
                56
              Statuses
                1K
              The Natural Language Processing Group at the University of Edinburgh.
              
              Edinburgh, Scotland
            
            
              
              Joined May 2017
            
            
           Join our PhD programme in Designing Responsible Natural Language Processing at the UKRI AI Centre for Doctoral Training, University of Edinburgh. Applications are now re-opened for Home fee status candidates (past candidates need not re-apply).  https://t.co/PkdXiVLEGr 
          
          
                
                0
              
              
                
                4
              
              
                
                8
              
             Yu (@yuzhaouoe) went for a 3-month internship at MSR Cambridge after working on completely different topics (LLM pre-training, steering, KV cache compression, knowledge augmentation..), and casually improved the state-of-the-art in GUI-using agents 🚀🚀🚀 
           Check out our “Learning GUI Grounding with Spatial Reasoning from Visual Feedback”! We reframe GUI grounding as an interactive search task by learning to move a virtual cursor via RL and using visual feedback! Massive improvements on ScreenSpot-v2: (+5.7%) and -Pro (+110.8%)! 
            
                
                1
              
              
                
                1
              
              
                
                10
              
             Check out our “Learning GUI Grounding with Spatial Reasoning from Visual Feedback”! We reframe GUI grounding as an interactive search task by learning to move a virtual cursor via RL and using visual feedback! Massive improvements on ScreenSpot-v2: (+5.7%) and -Pro (+110.8%)! 
          
                
                2
              
              
                
                12
              
              
                
                14
              
             Check out our new EMNLP paper! Multilingual fairness is tough, bias behaves differently across languages, and most methods don’t transfer. We make progress with IMSAE, which removes shared bias subspaces across languages, even without target-language data! 
           Multilingual fairness is deceptively hard. Bias behaves differently across languages, grammatical gender in Spanish, social bias in English, morphological cues in Russian. You can’t just “transfer” debiasing and expect it to work. That’s the problem we tackle in our EMNLP paper. 
            
                
                0
              
              
                
                1
              
              
                
                11
              
             ⚠️ Only 2 days remaining to apply for a postdoc at @EdinburghNLP! ⚠️ 
           I am looking for a 2-year 𝗽𝗼𝘀𝘁𝗱𝗼𝗰 to work on efficient foundation models at @InfAtEd and @EPCCed! This is part of the @ARIA_research funding for Scaling Compute: AI at 1/1000th the cost 
            
                
                0
              
              
                
                5
              
              
                
                16
              
             NO verifiers. NO Tools. Qwen3-4B-Instruct can match DeepSeek-R1 and o3-mini (high) with ONLY test-time scaling. Presenting Recursive Self-Aggregation (RSA) — the strongest test-time scaling method I know of! Then we use aggregation-aware RL to push further!! 📈📈 🧵below! 
          
                
                22
              
              
                
                102
              
              
                
                787
              
             Accepted @ NeurIPS 2025 Workshop on Evaluating the Evolving LLM Lifecycle. #NeurIPS2025
          
           Can multimodal LLMs truly understand research poster images?📊 🚀 We introduce PosterSum—a new multimodal benchmark for scientific poster summarization! 🪧 📂 Dataset:  https://t.co/B5NzvqnWUA  📜 Paper:  https://t.co/EHt4SwaGF3 
            
            
                
                0
              
              
                
                2
              
              
                
                11
              
             Really happy this is now out! 
          
            
            nature.com
              Nature Machine Intelligence - Ilievski et al. examine differences and similarities in the various ways human and AI systems generalize. The insights are important for effectively supporting...
             Aligning how humans and AI generalize Humans and machines learn in very different ways. People abstract concepts from a few examples and apply them flexibly—mixing common sense, analogy, and causal stories. Today’s AI systems mostly learn patterns from huge datasets and do well 
            
                
                0
              
              
                
                1
              
              
                
                13
              
             My amazing collaborators will be presenting two works at NeurIPS (@NeurIPSConf) on neuro-symbolic diffusion models (by the nesy superstar @EmilevanKrieken) and on multi-modal long-context evaluation! (led by the incredible @zhaoweiwang4) 👇 
          
                
                1
              
              
                
                14
              
              
                
                79
              
             Aligning how humans and AI generalize Humans and machines learn in very different ways. People abstract concepts from a few examples and apply them flexibly—mixing common sense, analogy, and causal stories. Today’s AI systems mostly learn patterns from huge datasets and do well 
          
                
                1
              
              
                
                12
              
              
                
                62
              
             🎉"Aligning generalization between humans and machines" (w/ 25 incredible authors) is out now in #Nature Machine Intelligence:  https://t.co/iHl4uikJ4f  In short, we identified interdisciplinary commonalities & differences for notions of, methods for & evaluation of generalization 
          
                
                0
              
              
                
                2
              
              
                
                12
              
             I am looking for a 2-year 𝗽𝗼𝘀𝘁𝗱𝗼𝗰 to work on efficient foundation models at @InfAtEd and @EPCCed! This is part of the @ARIA_research funding for Scaling Compute: AI at 1/1000th the cost 
          
                
                1
              
              
                
                17
              
              
                
                30
              
             With SEMI🌓, you can integrate entirely new modalities (satellite images, galaxies, inertia measurements, molecules, ...) into LLMs with as few as 32 samples! 
           Multimodal models typically need millions of examples from each modality paired with text for training. With SEMI 🌓, we integrate new low-resource modalities into LLMs with as few as 32 samples — including satellite images, galaxies, sensors, and molecules. (1/6) 
            
                
                0
              
              
                
                4
              
              
                
                34
              
             Multimodal models typically need millions of examples from each modality paired with text for training. With SEMI 🌓, we integrate new low-resource modalities into LLMs with as few as 32 samples — including satellite images, galaxies, sensors, and molecules. (1/6) 
          
                
                3
              
              
                
                39
              
              
                
                213
              
             🚀 Excited to see our work on PiCSAR out! Thrilled to have Joshua as a co-author — and even more thrilled that he’ll be joining my group this academic year. Big things ahead! 
           We introduce PiCSAR (Probabilistic Confidence Selection And Ranking)💡: A simple training-free method for scoring samples based on probabilistic confidence, selecting a reasoning chain with the highest confidence from multiple sampled responses. ✏️PiCSAR is generalisable across 
            
                
                0
              
              
                
                3
              
              
                
                12
              
             the bitter lesson hits again -- a while back we did a systematic analysis of many ways of speeding up pre-training (  https://t.co/6dQR1iLYQp,  NeurIPS 2023) and TLDR, just tuning Adam and decaying the learning rate still gets you SOTA 
           We did a very careful study of 10 optimizers with no horse in the race. Despite all the excitement about Muon, Mars, Kron, Soap, etc., at the end of the day, if you tune the hyperparameters rigorously and scale up, the speedup over AdamW diminishes to only 10% :-( Experiments 
          
                
                0
              
              
                
                3
              
              
                
                21
              
             I've been awarded a Starting Grant from @ERC_Research! As part of AToM-FM ⚛️, I'll study efficient architectures for foundation models with end-to-end tokenisation and adaptive+permanent memory Building a greener, more democratic AI 
           📣 The ERC Starting Grant call results are out! Find out which early-career researchers will receive funding, what they will be investigating, where they will be based... plus lots of other #ERCStG facts & figures for 2025! ➡️  https://t.co/cGctMhcJos  🇪🇺 #HorizonEurope
            
            
                
                14
              
              
                
                17
              
              
                
                142
              
             Apply to ELLIS if you’d like to do a PhD in NLP/ML spending time in two different European universities! 
           🎓 Interested in a #PhD in machine learning or #AI? The ELLIS PhD Program connects top students with leading researchers across Europe. The application portal opens on Oct 1st. Curious? Join our info session on the same day. Get all the info 👉  https://t.co/0Tq58uexHk 
              #ELLISPhD
            
          
                
                0
              
              
                
                2
              
              
                
                21
              
             We introduce PiCSAR (Probabilistic Confidence Selection And Ranking)💡: A simple training-free method for scoring samples based on probabilistic confidence, selecting a reasoning chain with the highest confidence from multiple sampled responses. ✏️PiCSAR is generalisable across 
          
                
                2
              
              
                
                30
              
              
                
                93
              
             🧵7/8 Inverse Scaling in Test-Time Compute: led by @aryopg, with @haeggee, @RunjinChen, @andyarditi,Jacob Goldman-Wetzler, @KitF_T, @petrini_linda, @_julianmichael_, Beatrice Alex, @PMinervini, @yanda_chen_, @JoeJBenton, and @EthanJPerez.  https://t.co/KPBOjn39qw 
          
           New Anthropic Research: “Inverse Scaling in Test-Time Compute” We found cases where longer reasoning leads to lower accuracy. Our findings suggest that naïve scaling of test-time compute may inadvertently reinforce problematic reasoning patterns. 🧵 
            
                
                1
              
              
                
                2
              
              
                
                9
              
             Our method for achieving more faithful, verifiable and robust #LLM reasoning (FLARE 💫) has been accepted at #EMNLP2025 @emnlpmeeting ! Be sure to check out:  https://t.co/cSHn97iLVJ  Work done with the amazing @PMinervini @PSH_Lewis
            @pat_verga @IAugenstein
          
          
            
            arxiv.org
              Modern Question Answering (QA) and Reasoning approaches based on Large Language Models (LLMs) commonly use prompting techniques, such as Chain-of-Thought (CoT), assuming the resulting generation...
             👋Psst! Want more faithful, verifiable and robust #LLM reasoning than with CoT, but using external solvers is meh? Our FLARE💫uses Logic Programming with Exhaustive Simulated Search to achieve this.🧵 With @PMinervini @PSH_Lewis @pat_verga @IAugenstein
               https://t.co/cSHn97iLVJ 
            
            
                
                0
              
              
                
                7
              
              
                
                27
              
             
             
               
             
             
             
             
               
             
               
             
               
               
               
               
             
              