EdinburghNLP Profile Banner
EdinburghNLP Profile
EdinburghNLP

@EdinburghNLP

Followers
13K
Following
827
Media
56
Statuses
1K

The Natural Language Processing Group at the University of Edinburgh.

Edinburgh, Scotland
Joined May 2017
Don't wanna be here? Send us removal request.
@EdinburghNLP
EdinburghNLP
7 months
Join our PhD programme in Designing Responsible Natural Language Processing at the UKRI AI Centre for Doctoral Training, University of Edinburgh. Applications are now re-opened for Home fee status candidates (past candidates need not re-apply). https://t.co/PkdXiVLEGr
0
4
8
@PMinervini
Pasquale Minervini
15 days
Yu (@yuzhaouoe) went for a 3-month internship at MSR Cambridge after working on completely different topics (LLM pre-training, steering, KV cache compression, knowledge augmentation..), and casually improved the state-of-the-art in GUI-using agents 🚀🚀🚀
@yuzhaouoe
Yu Zhao
15 days
Check out our “Learning GUI Grounding with Spatial Reasoning from Visual Feedback”! We reframe GUI grounding as an interactive search task by learning to move a virtual cursor via RL and using visual feedback! Massive improvements on ScreenSpot-v2: (+5.7%) and -Pro (+110.8%)!
1
1
10
@yuzhaouoe
Yu Zhao
15 days
Check out our “Learning GUI Grounding with Spatial Reasoning from Visual Feedback”! We reframe GUI grounding as an interactive search task by learning to move a virtual cursor via RL and using visual feedback! Massive improvements on ScreenSpot-v2: (+5.7%) and -Pro (+110.8%)!
2
12
14
@YftahZ
Yftah Ziser
22 days
Check out our new EMNLP paper! Multilingual fairness is tough, bias behaves differently across languages, and most methods don’t transfer. We make progress with IMSAE, which removes shared bias subspaces across languages, even without target-language data!
@YftahZ
Yftah Ziser
22 days
Multilingual fairness is deceptively hard. Bias behaves differently across languages, grammatical gender in Spanish, social bias in English, morphological cues in Russian. You can’t just “transfer” debiasing and expect it to work. That’s the problem we tackle in our EMNLP paper.
0
1
11
@PontiEdoardo
Edoardo Ponti
1 month
⚠️ Only 2 days remaining to apply for a postdoc at @EdinburghNLP! ⚠️
@PontiEdoardo
Edoardo Ponti
2 months
I am looking for a 2-year 𝗽𝗼𝘀𝘁𝗱𝗼𝗰 to work on efficient foundation models at @InfAtEd and @EPCCed! This is part of the @ARIA_research funding for Scaling Compute: AI at 1/1000th the cost
0
5
16
@siddarthv66
Siddarth Venkatraman
1 month
NO verifiers. NO Tools. Qwen3-4B-Instruct can match DeepSeek-R1 and o3-mini (high) with ONLY test-time scaling. Presenting Recursive Self-Aggregation (RSA) — the strongest test-time scaling method I know of! Then we use aggregation-aware RL to push further!! 📈📈 🧵below!
22
102
787
@rohit_saxena
Rohit Saxena
1 month
Accepted @ NeurIPS 2025 Workshop on Evaluating the Evolving LLM Lifecycle. #NeurIPS2025
@rohit_saxena
Rohit Saxena
8 months
Can multimodal LLMs truly understand research poster images?📊 🚀 We introduce PosterSum—a new multimodal benchmark for scientific poster summarization! 🪧 📂 Dataset: https://t.co/B5NzvqnWUA 📜 Paper: https://t.co/EHt4SwaGF3
0
2
11
@PMinervini
Pasquale Minervini
2 months
Really happy this is now out!
Tweet card summary image
nature.com
Nature Machine Intelligence - Ilievski et al. examine differences and similarities in the various ways human and AI systems generalize. The insights are important for effectively supporting...
@bravo_abad
Jorge Bravo Abad
2 months
Aligning how humans and AI generalize Humans and machines learn in very different ways. People abstract concepts from a few examples and apply them flexibly—mixing common sense, analogy, and causal stories. Today’s AI systems mostly learn patterns from huge datasets and do well
0
1
13
@PMinervini
Pasquale Minervini
1 month
My amazing collaborators will be presenting two works at NeurIPS (@NeurIPSConf) on neuro-symbolic diffusion models (by the nesy superstar @EmilevanKrieken) and on multi-modal long-context evaluation! (led by the incredible @zhaoweiwang4) 👇
1
14
79
@bravo_abad
Jorge Bravo Abad
2 months
Aligning how humans and AI generalize Humans and machines learn in very different ways. People abstract concepts from a few examples and apply them flexibly—mixing common sense, analogy, and causal stories. Today’s AI systems mostly learn patterns from huge datasets and do well
1
12
62
@mundt_martin
Martin Mundt
2 months
🎉"Aligning generalization between humans and machines" (w/ 25 incredible authors) is out now in #Nature Machine Intelligence: https://t.co/iHl4uikJ4f In short, we identified interdisciplinary commonalities & differences for notions of, methods for & evaluation of generalization
0
2
12
@PontiEdoardo
Edoardo Ponti
2 months
I am looking for a 2-year 𝗽𝗼𝘀𝘁𝗱𝗼𝗰 to work on efficient foundation models at @InfAtEd and @EPCCed! This is part of the @ARIA_research funding for Scaling Compute: AI at 1/1000th the cost
1
17
30
@PontiEdoardo
Edoardo Ponti
2 months
With SEMI🌓, you can integrate entirely new modalities (satellite images, galaxies, inertia measurements, molecules, ...) into LLMs with as few as 32 samples!
@ospanbatyr
Osman Batur İnce
2 months
Multimodal models typically need millions of examples from each modality paired with text for training. With SEMI 🌓, we integrate new low-resource modalities into LLMs with as few as 32 samples — including satellite images, galaxies, sensors, and molecules. (1/6)
0
4
34
@ospanbatyr
Osman Batur İnce
2 months
Multimodal models typically need millions of examples from each modality paired with text for training. With SEMI 🌓, we integrate new low-resource modalities into LLMs with as few as 32 samples — including satellite images, galaxies, sensors, and molecules. (1/6)
3
39
213
@e_giunchiglia
Eleonora Giunchiglia
2 months
🚀 Excited to see our work on PiCSAR out! Thrilled to have Joshua as a co-author — and even more thrilled that he’ll be joining my group this academic year. Big things ahead!
@joshuaongg21
Joshua Ong @ EMNLP2025
2 months
We introduce PiCSAR (Probabilistic Confidence Selection And Ranking)💡: A simple training-free method for scoring samples based on probabilistic confidence, selecting a reasoning chain with the highest confidence from multiple sampled responses. ✏️PiCSAR is generalisable across
0
3
12
@PMinervini
Pasquale Minervini
2 months
the bitter lesson hits again -- a while back we did a systematic analysis of many ways of speeding up pre-training ( https://t.co/6dQR1iLYQp, NeurIPS 2023) and TLDR, just tuning Adam and decaying the learning rate still gets you SOTA
@percyliang
Percy Liang
2 months
We did a very careful study of 10 optimizers with no horse in the race. Despite all the excitement about Muon, Mars, Kron, Soap, etc., at the end of the day, if you tune the hyperparameters rigorously and scale up, the speedup over AdamW diminishes to only 10% :-( Experiments
0
3
21
@PontiEdoardo
Edoardo Ponti
2 months
I've been awarded a Starting Grant from @ERC_Research! As part of AToM-FM ⚛️, I'll study efficient architectures for foundation models with end-to-end tokenisation and adaptive+permanent memory Building a greener, more democratic AI
@ERC_Research
European Research Council (ERC)
2 months
📣 The ERC Starting Grant call results are out! Find out which early-career researchers will receive funding, what they will be investigating, where they will be based... plus lots of other #ERCStG facts & figures for 2025! ➡️ https://t.co/cGctMhcJos 🇪🇺 #HorizonEurope
14
17
142
@PontiEdoardo
Edoardo Ponti
2 months
Apply to ELLIS if you’d like to do a PhD in NLP/ML spending time in two different European universities!
@ELLISforEurope
ELLIS
2 months
🎓 Interested in a #PhD in machine learning or #AI? The ELLIS PhD Program connects top students with leading researchers across Europe. The application portal opens on Oct 1st. Curious? Join our info session on the same day. Get all the info 👉 https://t.co/0Tq58uexHk #ELLISPhD
0
2
21
@joshuaongg21
Joshua Ong @ EMNLP2025
2 months
We introduce PiCSAR (Probabilistic Confidence Selection And Ranking)💡: A simple training-free method for scoring samples based on probabilistic confidence, selecting a reasoning chain with the highest confidence from multiple sampled responses. ✏️PiCSAR is generalisable across
2
30
93
@sleight_henry
🚀Henry is launching the Astra Research Program!
2 months
🧵7/8 Inverse Scaling in Test-Time Compute: led by @aryopg, with @haeggee, @RunjinChen, @andyarditi,Jacob Goldman-Wetzler, @KitF_T, @petrini_linda, @_julianmichael_, Beatrice Alex, @PMinervini, @yanda_chen_, @JoeJBenton, and @EthanJPerez. https://t.co/KPBOjn39qw
@aryopg
Aryo Pradipta Gema
3 months
New Anthropic Research: “Inverse Scaling in Test-Time Compute” We found cases where longer reasoning leads to lower accuracy. Our findings suggest that naïve scaling of test-time compute may inadvertently reinforce problematic reasoning patterns. 🧵
1
2
9
@_kire_kara_
Erik Arakelyan
2 months
Our method for achieving more faithful, verifiable and robust #LLM reasoning (FLARE 💫) has been accepted at #EMNLP2025 @emnlpmeeting ! Be sure to check out: https://t.co/cSHn97iLVJ Work done with the amazing @PMinervini @PSH_Lewis @pat_verga @IAugenstein
Tweet card summary image
arxiv.org
Modern Question Answering (QA) and Reasoning approaches based on Large Language Models (LLMs) commonly use prompting techniques, such as Chain-of-Thought (CoT), assuming the resulting generation...
@_kire_kara_
Erik Arakelyan
1 year
👋Psst! Want more faithful, verifiable and robust #LLM reasoning than with CoT, but using external solvers is meh? Our FLARE💫uses Logic Programming with Exhaustive Simulated Search to achieve this.🧵 With @PMinervini @PSH_Lewis @pat_verga @IAugenstein https://t.co/cSHn97iLVJ
0
7
27