Andrei Lupu @_andreilupu X Profile

Andrei Lupu

@_andreilupu

Followers

739

Following

3K

Media

41

Statuses

271

DPhil student @FLAIR_Ox and @AIatMeta. Previously @Mila_Quebec and @rllabmcgill Theory of Mind / Coordination / Rainbow Teaming 🌈 Opinions my own.

Joined December 2016

Don't wanna be here? Send us removal request.

Andrei Lupu

@_andreilupu

4 months

Theory of Mind (ToM) is crucial for next gen LLM Agents, yet current benchmarks suffer from multiple shortcomings. Enter 💽 Decrypto, an interactive benchmark for multi-agent reasoning and ToM in LLMs! Work done with @TimonWilli & @j_foerst at @AIatMeta & @FLAIR_Ox 🧵👇

4

30

104

Andrei Lupu

@_andreilupu

25 days

A band-aid, to be sure, but an elegant one.

EurIPS Conference

@EurIPSConf

26 days

Congratulations to everyone who got their @NeurIPSConf papers accepted 🎉🎉🎉 At #EurIPS we are looking forward to welcoming presentations of all accepted NeurIPS papers, including a new “Salon des Refusés” track for papers which were rejected due to space constraints!

0

1

Minqi Jiang

@MinqiJiang

1 month

What if you kept asking an LLM to "make it better"? In some recent work at FAIR, we investigate how we can efficiently use RL to fine-tune LLMs to iteratively self-improve on their previous solutions at inference-time. Training for iterated self-improvement can be costly. The

15

76

412

Keith Sakata, MD

@KeithSakata

2 months

I’m a psychiatrist. In 2025, I’ve seen 12 people hospitalized after losing touch with reality because of AI. Online, I’m seeing the same pattern. Here’s what “AI psychosis” looks like, and why it’s spreading fast: 🧵

2K

14K

95K

Andrei Lupu

@_andreilupu

2 months

Congratulations! Well deserved! 🎉

Alex Goldie

@AlexDGoldie

2 months

🥳 It’s an honour to have been awarded the Outstanding Paper for Scientific Understanding in RL at RLC for our work, ‘How Should We Meta-Learn RL Algorithms?’ Thank you to the organisers @RL_Conference for putting on a great conference, and congratulations to the other winners!

0

1

2

Andrei Lupu

@_andreilupu

2 months

It's because 9.11 is bigger than 9.8

Joseph Thacker

@rez0__

2 months

@sama how is 52 higher than 69?

0

7

Andrei Lupu

@_andreilupu

2 months

We discovered alien intelligence in sand, and can now play its dreams in real time with a mouse and keyboard. Congrats to the team! Now, can it run Doom? 🤔

Jack Parker-Holder

@jparkerholder

2 months

Genie 3 feels like a watershed moment for world models 🌐: we can now generate multi-minute, real-time interactive simulations of any imaginable world. This could be the key missing piece for embodied AGI… and it can also create beautiful beaches with my dog, playable real time

1

22

Andrei Lupu

@_andreilupu

2 months

Here is The Decrypto Benchmark for Multi-Agent Reasoning and Theory of Mind, which we announced a little over a month ago! https://t.co/P23SLr4vwj 🔗 https://t.co/EKKog6vB6A 📜 https://t.co/8njz4agZIl 💽 https://t.co/FZQcp1pUw4

Andrei Lupu

@_andreilupu

4 months

Theory of Mind (ToM) is crucial for next gen LLM Agents, yet current benchmarks suffer from multiple shortcomings. Enter 💽 Decrypto, an interactive benchmark for multi-agent reasoning and ToM in LLMs! Work done with @TimonWilli & @j_foerst at @AIatMeta & @FLAIR_Ox 🧵👇

1

0

4

Andrei Lupu

@_andreilupu

2 months

Games isolate key aspects of intelligence and make for fantastic evergreen benchmarks. Thrilled to see them come back in style! And if you're excited about LLM Theory of Mind, how about a game of Decrypto with your favourite LLM? 👀👇

Google DeepMind

@GoogleDeepMind

2 months

We have a long history of using games to measure progress in AI. 🎮 That’s why we’re helping unveil the @Kaggle Game Arena: an open-source platform where models go head-to-head in complex games to help us gauge their capabilities. 🧵

1

5

Andrei Lupu

@_andreilupu

3 months

Sloppy authors write sloppy reviews. Conferences should publish rejection rates for first and last authors. This cuts down the number of submission, improves submission quality, and limits the number of sloppy authors being forced to review. Make this retroactive, too!🧹

0

7

Alex Goldie

@AlexDGoldie

3 months

1/ 🕵️ Algorithm discovery could lead to huge AI breakthroughs! But what is the best way to learn or discover new algorithms? I'm so excited to share our brand new @rl_conference paper which takes a step towards answering this! 🧵

3

40

213

Uljad @ NeurIPS

@uljadb99

3 months

Unlock real diversity in your LLM! 🚀 LLM outputs can be boring and repetitive. Today, we release Intent Factored Generation (IFG) to: - Sample conceptually diverse outputs💡 - Improve performance on math and code reasoning tasks🤔 - Get more engaging conversational agents 🤖

1

10

36

Andrei Lupu

@_andreilupu

3 months

"You can just do things," if you don't care how this will affect society at large.

Eugene Vinitsky 🦋

@EugeneVinitsky

3 months

How can you be actively working on an AI girlfriend and not think less of yourself? What is your moral justification for your work making the world better?

0

6

Martin Josifoski

@MartinJosifoski

3 months

Scaling AI research agents is key to tackling some of the toughest challenges in the field. But what's required to scale effectively? It turns out that simply throwing more compute at the problem isn't enough. We break down an agent into four fundamental components that shape

5

33

154

Yoram Bachrach

@yorambac

3 months

AI Research Agents are becoming proficient at machine learning tasks, but how can we help them search the space of candidate solutions and codebases? Read our new paper looking at MLE-Bench: https://t.co/uX09L8zOBi #LLM #Agents #MLEBench

8

68

318

Andrei Lupu

@_andreilupu

3 months

Biology is computable, and evolution is exploitable! 🧬 @SebastianTower6 and @OlaKalisz8 have taken opponent shaping out of the petri dish of MARL environments and applied it to the super crucial problem of Antibody design. 🧫 Check out their work below!

Ola Kalisz

@OlaKalisz8

3 months

Antiviral therapy design is myopic 🦠🙈 optimised only for the current strain. That's why you need a different Flu vaccine every year! Our #ICML2025 paper ADIOS proposes "shaper therapies" that steer viral evolution in our favour & remain effective. Work done @FLAIR_Ox 🧵👇

0

1

7

Minqi Jiang

@MinqiJiang

4 months

Recently, there has been a lot of talk of LLM agents automating ML research itself. If Llama 5 can create Llama 6, then surely the singularity is just around the corner. How can we get a pulse check on whether current LLMs are capable of driving this kind of total

41

198

1K

Andrei Lupu

@_andreilupu

4 months

Most AI labs don't try to build AI for normal people. They try to build the AI that will build AI for normal people (and for everything else). Which isn't to say that memory isn't important.

Jack Morris

@jxmnop

4 months

seems big AI labs are hyperfixating on reasoning when they should focus on *memory* instead normal people won't use models that can think for hours to solve hard math problems people want models that learn over time, remember details, adapt and interact like a person would

0

5

Andrei Lupu

@_andreilupu

4 months

RL truly is here to stay

Shizhe Diao

@shizhediao

4 months

Does RL truly expand a model’s reasoning🧠capabilities? Contrary to recent claims, the answer is yes—if you push RL training long enough! Introducing ProRL 😎, a novel training recipe that scales RL to >2k steps, empowering the world’s leading 1.5B reasoning model💥and offering

0

1

Ola Kalisz

@OlaKalisz8

4 months

Very cool LLM benchmark based on the game - Decrypto. It shows some surprising shortcomings of the current LLM models. But what's even cooler is that it allows us to analyse the Theory of Mind of LLMs, definitely check it out! Congrats to the authors 🎉

Andrei Lupu

@_andreilupu

4 months

Theory of Mind (ToM) is crucial for next gen LLM Agents, yet current benchmarks suffer from multiple shortcomings. Enter 💽 Decrypto, an interactive benchmark for multi-agent reasoning and ToM in LLMs! Work done with @TimonWilli & @j_foerst at @AIatMeta & @FLAIR_Ox 🧵👇

0

1

11

Mikayel Samvelyan

@_samvelyan

4 months

Much-needed multi-agent benchmark for LLMs 👥 Theory of Mind is key as LLMs act in agentic, interactive settings — yet remains underexplored and hard to measure. 💽 Decrypto offers an ToM-based evaluation of reasoning for agents operating in complex social settings. Great work!

Andrei Lupu

@_andreilupu

4 months

Theory of Mind (ToM) is crucial for next gen LLM Agents, yet current benchmarks suffer from multiple shortcomings. Enter 💽 Decrypto, an interactive benchmark for multi-agent reasoning and ToM in LLMs! Work done with @TimonWilli & @j_foerst at @AIatMeta & @FLAIR_Ox 🧵👇

0

3

22