Simone Balloccu @simoneballoccu X Profile

Simone Balloccu

@simoneballoccu

Followers

319

Following

2K

Media

63

Statuses

742

(he/him) ExpNLP lab leader @TUDarmstadt. Researching AI w.r.t human evaluation, behaviour change, safety and controllability, expert domains. Opnions my own.

https://t.co/KLfuf9PJWK

Darmstadt

Joined July 2020

Don't wanna be here? Send us removal request.

Simone Balloccu

@simoneballoccu

2 years

🚨Happy to share that our paper "Leak, Cheat, Repeat: Data Contamination and Evaluation Malpractices in Closed-Source LLMs" has been accepted at #eacl2024.🚨 Huge thanks to my co-authors @PSchmidtova, @LangoMateusz and @tuetschek. Link: https://t.co/wD1HayuGPJ

3

20

82

Luca Soldaini 🌯 NeurIPS 2025

@soldni

7 days

This release has SO MUCH • New pretrain corpus, new midtrain data, 380B+ long context tokens • 7B & 32B, Base, Instruct, Think, RL Zero • Close to Qwen 3 performance, but fully open!!

Ai2

@allen_ai

7 days

Announcing Olmo 3, a leading fully open LM suite built for reasoning, chat, & tool use, and an open model flow—not just the final weights, but the entire training journey. Best fully open 32B reasoning model & best 32B base model. 🧵

22

42

411

Ehud Reiter

@EhudReiter

17 days

I'm disturbing reports about chatbots encouraging children to kill themselves. such as https://t.co/PdgvTaYPHi . Shame that the AI Safety community in general, and the @AISecurityInst in particular, seem to have little interest in this, very disappointing...

bbc.co.uk

In her first UK interview Megan Garcia speaks to Laura Kuenssberg about the death of her teenage son.

1

2

Luis Batalha

@luismbat

27 days

Imagine losing first authorship because you got hit by a blue shell on the last lap 💀

GLADIA Research Lab

@GladiaLab

1 month

LLMs are injective and invertible. In our new paper, we show that different prompts always map to different embeddings, and this property can be used to recover input tokens from individual embeddings in latent space. (1/6)

87

4K

52K

Karen Li

@Saeshyra

1 month

Excited to share that our paper "When LLMs Can’t Help: Real-World Evaluation of LLMs in Nutrition" (with @simoneballoccu, @tuetschek and @EhudReiter) will be at #INLG2025 in Hanoi! Ours is the first (7-week) RCT testing if LLMs can help improve eating habits. 🥕

4

2

5

Simone Balloccu

@simoneballoccu

1 month

Life once you start supervising: 9-10: meeting 10-10.30 meeting 10.30-11 meeting 11-11:30 meeting ✨ coffee ✨ 13.30-14 meeting 14-15 meeting 15.30-16 meeting 16-17 meeting

0

1

Math Lady Hazel 🇦🇷

@mathladyhazel

2 months

You can add coffee stains to your LaTeX documents. https://t.co/xFh97sjEcI

60

771

7K

Ehud Reiter

@EhudReiter

2 months

New blog: Good diagrams for research papers Ive seen a number of diagrams recently which are too complicated and difficult to understand. I explain some of the problems I see and give advice. https://t.co/4Lp5UWU06g

ehudreiter.com

Ive seen a number of diagrams recently which are too complicated and difficult to understand. I explain some of the problems I see and give advice.

0

2

9

Simone Balloccu

@simoneballoccu

4 months

As we fall in love with yet another "superintelligent" AGI whatever, let's remind ourselves that text prediction on steroids still is text prediction on steroids

0

1

Ehud Reiter

@EhudReiter

4 months

New blog: More on evaluating impact I got great feedback from recent paper and talk on eval impact, and summarise some of the suggested papers (including more examples of impact eval) and insightful comments (eg, about eval “ecosystem”) I received. https://t.co/zZxsVJBtfD

ehudreiter.com

I recently published a paper and gave a talk about evaluating real-world impact. I got some great feedback from this, and summarise some of the suggested papers (including more examples of impact e…

0

1

10

Ehud Reiter

@EhudReiter

4 months

Motivated by recent discussion with my group: Ignore subjective statements such as "I find LLMs to be incredibly useful for XX", especially when made by people (such as AI companies or gurus) who have strong biases/incentives/COI .

0

1

2

Vilém Zouhar #EMNLP

@zouharvi

5 months

You have a budget to human-evaluate 100 inputs to your models, but your dataset is 10,000 inputs. Do not just pick 100 randomly!🙅 We can do better. "How to Select Datapoints for Efficient Human Evaluation of NLG Models?" shows how.🕵️ (random is still a devilishly good baseline)

2

16

73

Leon Derczynski ✍🏻 🌞🏠🌲

@LeonDerczynski

5 months

did people get greedy and sloppy and ruin it like with almost everything ever? you tell me

2

3

8

Jia-Bin Huang

@jbhuang0604

5 months

Writing a rebuttal is 30% technical and 70% reviewers' psychology.

13

10

310

Dr. Dominic Ng

@DrDominicNg

5 months

Microsoft claims their new AI framework diagnoses 4x better than doctors. I'm a medical doctor and I actually read the paper. Here's my perspective on why this is both impressive AND misleading ... 🧵

275

1K

9K

Marco Guerini

@m_guerini

5 months

I love this analysis of the limitations of the experimental setting/design. This is the kind of expert insight and methodological rigor we need when evaluating LLMs!

Dr. Dominic Ng

@DrDominicNg

5 months

Microsoft claims their new AI framework diagnoses 4x better than doctors. I'm a medical doctor and I actually read the paper. Here's my perspective on why this is both impressive AND misleading ... 🧵

0

1

4