EdinburghNLP @EdinburghNLP X Profile

EdinburghNLP

@EdinburghNLP

Followers

13K

Following

827

Media

56

Statuses

1K

The Natural Language Processing Group at the University of Edinburgh.

https://t.co/2M73kFKfG3

Edinburgh, Scotland

Joined May 2017

Don't wanna be here? Send us removal request.

EdinburghNLP

@EdinburghNLP

7 months

Join our PhD programme in Designing Responsible Natural Language Processing at the UKRI AI Centre for Doctoral Training, University of Edinburgh. Applications are now re-opened for Home fee status candidates (past candidates need not re-apply). https://t.co/PkdXiVLEGr

0

4

8

Pasquale Minervini

@PMinervini

15 days

Yu (@yuzhaouoe) went for a 3-month internship at MSR Cambridge after working on completely different topics (LLM pre-training, steering, KV cache compression, knowledge augmentation..), and casually improved the state-of-the-art in GUI-using agents 🚀🚀🚀

Yu Zhao

@yuzhaouoe

15 days

Check out our “Learning GUI Grounding with Spatial Reasoning from Visual Feedback”! We reframe GUI grounding as an interactive search task by learning to move a virtual cursor via RL and using visual feedback! Massive improvements on ScreenSpot-v2: (+5.7%) and -Pro (+110.8%)!

1

10

Yu Zhao

@yuzhaouoe

15 days

Check out our “Learning GUI Grounding with Spatial Reasoning from Visual Feedback”! We reframe GUI grounding as an interactive search task by learning to move a virtual cursor via RL and using visual feedback! Massive improvements on ScreenSpot-v2: (+5.7%) and -Pro (+110.8%)!

2

12

14

Yftah Ziser

@YftahZ

22 days

Check out our new EMNLP paper! Multilingual fairness is tough, bias behaves differently across languages, and most methods don’t transfer. We make progress with IMSAE, which removes shared bias subspaces across languages, even without target-language data!

Yftah Ziser

@YftahZ

22 days

Multilingual fairness is deceptively hard. Bias behaves differently across languages, grammatical gender in Spanish, social bias in English, morphological cues in Russian. You can’t just “transfer” debiasing and expect it to work. That’s the problem we tackle in our EMNLP paper.

0

1

11

Edoardo Ponti

@PontiEdoardo

1 month

⚠️ Only 2 days remaining to apply for a postdoc at @EdinburghNLP! ⚠️

Edoardo Ponti

@PontiEdoardo

2 months

I am looking for a 2-year 𝗽𝗼𝘀𝘁𝗱𝗼𝗰 to work on efficient foundation models at @InfAtEd and @EPCCed! This is part of the @ARIA_research funding for Scaling Compute: AI at 1/1000th the cost

0

5

16

Siddarth Venkatraman

@siddarthv66

1 month

NO verifiers. NO Tools. Qwen3-4B-Instruct can match DeepSeek-R1 and o3-mini (high) with ONLY test-time scaling. Presenting Recursive Self-Aggregation (RSA) — the strongest test-time scaling method I know of! Then we use aggregation-aware RL to push further!! 📈📈 🧵below!

22

102

787

Rohit Saxena

@rohit_saxena

1 month

Accepted @ NeurIPS 2025 Workshop on Evaluating the Evolving LLM Lifecycle. #NeurIPS2025

Rohit Saxena

@rohit_saxena

8 months

Can multimodal LLMs truly understand research poster images?📊 🚀 We introduce PosterSum—a new multimodal benchmark for scientific poster summarization! 🪧 📂 Dataset: https://t.co/B5NzvqnWUA 📜 Paper: https://t.co/EHt4SwaGF3

0

2

11

Pasquale Minervini

@PMinervini

2 months

Really happy this is now out!

nature.com

Nature Machine Intelligence - Ilievski et al. examine differences and similarities in the various ways human and AI systems generalize. The insights are important for effectively supporting...

Jorge Bravo Abad

@bravo_abad

2 months

Aligning how humans and AI generalize Humans and machines learn in very different ways. People abstract concepts from a few examples and apply them flexibly—mixing common sense, analogy, and causal stories. Today’s AI systems mostly learn patterns from huge datasets and do well

0

1

13

Pasquale Minervini

@PMinervini

1 month

My amazing collaborators will be presenting two works at NeurIPS (@NeurIPSConf) on neuro-symbolic diffusion models (by the nesy superstar @EmilevanKrieken) and on multi-modal long-context evaluation! (led by the incredible @zhaoweiwang4) 👇

1

14

79

Jorge Bravo Abad

@bravo_abad

2 months

Aligning how humans and AI generalize Humans and machines learn in very different ways. People abstract concepts from a few examples and apply them flexibly—mixing common sense, analogy, and causal stories. Today’s AI systems mostly learn patterns from huge datasets and do well

1

12

62

Martin Mundt

@mundt_martin

2 months

🎉"Aligning generalization between humans and machines" (w/ 25 incredible authors) is out now in #Nature Machine Intelligence: https://t.co/iHl4uikJ4f In short, we identified interdisciplinary commonalities & differences for notions of, methods for & evaluation of generalization

0

2

12

Edoardo Ponti

@PontiEdoardo

2 months

I am looking for a 2-year 𝗽𝗼𝘀𝘁𝗱𝗼𝗰 to work on efficient foundation models at @InfAtEd and @EPCCed! This is part of the @ARIA_research funding for Scaling Compute: AI at 1/1000th the cost

1

17

30

Edoardo Ponti

@PontiEdoardo

2 months

With SEMI🌓, you can integrate entirely new modalities (satellite images, galaxies, inertia measurements, molecules, ...) into LLMs with as few as 32 samples!

Osman Batur İnce

@ospanbatyr

2 months

Multimodal models typically need millions of examples from each modality paired with text for training. With SEMI 🌓, we integrate new low-resource modalities into LLMs with as few as 32 samples — including satellite images, galaxies, sensors, and molecules. (1/6)

0

4

34

Osman Batur İnce

@ospanbatyr

2 months

Multimodal models typically need millions of examples from each modality paired with text for training. With SEMI 🌓, we integrate new low-resource modalities into LLMs with as few as 32 samples — including satellite images, galaxies, sensors, and molecules. (1/6)

3

39

213

Eleonora Giunchiglia

@e_giunchiglia

2 months

🚀 Excited to see our work on PiCSAR out! Thrilled to have Joshua as a co-author — and even more thrilled that he’ll be joining my group this academic year. Big things ahead!

Joshua Ong @ EMNLP2025

@joshuaongg21

2 months

We introduce PiCSAR (Probabilistic Confidence Selection And Ranking)💡: A simple training-free method for scoring samples based on probabilistic confidence, selecting a reasoning chain with the highest confidence from multiple sampled responses. ✏️PiCSAR is generalisable across

0

3

12

Pasquale Minervini

@PMinervini

2 months

the bitter lesson hits again -- a while back we did a systematic analysis of many ways of speeding up pre-training ( https://t.co/6dQR1iLYQp, NeurIPS 2023) and TLDR, just tuning Adam and decaying the learning rate still gets you SOTA

Percy Liang

@percyliang

2 months

We did a very careful study of 10 optimizers with no horse in the race. Despite all the excitement about Muon, Mars, Kron, Soap, etc., at the end of the day, if you tune the hyperparameters rigorously and scale up, the speedup over AdamW diminishes to only 10% :-( Experiments

0

3

21

Edoardo Ponti

@PontiEdoardo

2 months

I've been awarded a Starting Grant from @ERC_Research! As part of AToM-FM ⚛️, I'll study efficient architectures for foundation models with end-to-end tokenisation and adaptive+permanent memory Building a greener, more democratic AI

European Research Council (ERC)

@ERC_Research

2 months

📣 The ERC Starting Grant call results are out! Find out which early-career researchers will receive funding, what they will be investigating, where they will be based... plus lots of other #ERCStG facts & figures for 2025! ➡️ https://t.co/cGctMhcJos 🇪🇺 #HorizonEurope

14

17

142

Edoardo Ponti

@PontiEdoardo

2 months

Apply to ELLIS if you’d like to do a PhD in NLP/ML spending time in two different European universities!

ELLIS

@ELLISforEurope

2 months

🎓 Interested in a #PhD in machine learning or #AI? The ELLIS PhD Program connects top students with leading researchers across Europe. The application portal opens on Oct 1st. Curious? Join our info session on the same day. Get all the info 👉 https://t.co/0Tq58uexHk #ELLISPhD

0

2

21

Joshua Ong @ EMNLP2025

@joshuaongg21

2 months

We introduce PiCSAR (Probabilistic Confidence Selection And Ranking)💡: A simple training-free method for scoring samples based on probabilistic confidence, selecting a reasoning chain with the highest confidence from multiple sampled responses. ✏️PiCSAR is generalisable across

2

30

93

🚀Henry is launching the Astra Research Program!

@sleight_henry

2 months

🧵7/8 Inverse Scaling in Test-Time Compute: led by @aryopg, with @haeggee, @RunjinChen, @andyarditi,Jacob Goldman-Wetzler, @KitF_T, @petrini_linda, @_julianmichael_, Beatrice Alex, @PMinervini, @yanda_chen_, @JoeJBenton, and @EthanJPerez. https://t.co/KPBOjn39qw

Aryo Pradipta Gema

@aryopg

3 months

New Anthropic Research: “Inverse Scaling in Test-Time Compute” We found cases where longer reasoning leads to lower accuracy. Our findings suggest that naïve scaling of test-time compute may inadvertently reinforce problematic reasoning patterns. 🧵

1

2

9

Erik Arakelyan

@_kire_kara_

2 months

Our method for achieving more faithful, verifiable and robust #LLM reasoning (FLARE 💫) has been accepted at #EMNLP2025 @emnlpmeeting ! Be sure to check out: https://t.co/cSHn97iLVJ Work done with the amazing @PMinervini @PSH_Lewis @pat_verga @IAugenstein

arxiv.org

Modern Question Answering (QA) and Reasoning approaches based on Large Language Models (LLMs) commonly use prompting techniques, such as Chain-of-Thought (CoT), assuming the resulting generation...

Erik Arakelyan

@_kire_kara_

1 year

👋Psst! Want more faithful, verifiable and robust #LLM reasoning than with CoT, but using external solvers is meh? Our FLARE💫uses Logic Programming with Exhaustive Simulated Search to achieve this.🧵 With @PMinervini @PSH_Lewis @pat_verga @IAugenstein https://t.co/cSHn97iLVJ

0

7

27