simoneballoccu Profile Banner
Simone Balloccu Profile
Simone Balloccu

@simoneballoccu

Followers
319
Following
2K
Media
63
Statuses
742

(he/him) ExpNLP lab leader @TUDarmstadt. Researching AI w.r.t human evaluation, behaviour change, safety and controllability, expert domains. Opnions my own.

Darmstadt
Joined July 2020
Don't wanna be here? Send us removal request.
@simoneballoccu
Simone Balloccu
2 years
🚨Happy to share that our paper "Leak, Cheat, Repeat: Data Contamination and Evaluation Malpractices in Closed-Source LLMs" has been accepted at #eacl2024.🚨 Huge thanks to my co-authors @PSchmidtova, @LangoMateusz and @tuetschek. Link: https://t.co/wD1HayuGPJ
3
20
82
@soldni
Luca Soldaini 🌯 NeurIPS 2025
7 days
This release has SO MUCH • New pretrain corpus, new midtrain data, 380B+ long context tokens • 7B & 32B, Base, Instruct, Think, RL Zero • Close to Qwen 3 performance, but fully open!!
@allen_ai
Ai2
7 days
Announcing Olmo 3, a leading fully open LM suite built for reasoning, chat, & tool use, and an open model flow—not just the final weights, but the entire training journey. Best fully open 32B reasoning model & best 32B base model. 🧵
22
42
411
@EhudReiter
Ehud Reiter
17 days
I'm disturbing reports about chatbots encouraging children to kill themselves. such as https://t.co/PdgvTaYPHi . Shame that the AI Safety community in general, and the @AISecurityInst in particular, seem to have little interest in this, very disappointing...
Tweet card summary image
bbc.co.uk
In her first UK interview Megan Garcia speaks to Laura Kuenssberg about the death of her teenage son.
1
1
2
@luismbat
Luis Batalha
27 days
Imagine losing first authorship because you got hit by a blue shell on the last lap 💀
@GladiaLab
GLADIA Research Lab
1 month
LLMs are injective and invertible. In our new paper, we show that different prompts always map to different embeddings, and this property can be used to recover input tokens from individual embeddings in latent space. (1/6)
87
4K
52K
@Saeshyra
Karen Li
1 month
Excited to share that our paper "When LLMs Can’t Help: Real-World Evaluation of LLMs in Nutrition" (with @simoneballoccu, @tuetschek and @EhudReiter) will be at #INLG2025 in Hanoi! Ours is the first (7-week) RCT testing if LLMs can help improve eating habits. 🥕
4
2
5
@simoneballoccu
Simone Balloccu
1 month
Life once you start supervising: 9-10: meeting 10-10.30 meeting 10.30-11 meeting 11-11:30 meeting ✨ coffee ✨ 13.30-14 meeting 14-15 meeting 15.30-16 meeting 16-17 meeting
0
0
1
@mathladyhazel
Math Lady Hazel 🇦🇷
2 months
You can add coffee stains to your LaTeX documents. https://t.co/xFh97sjEcI
60
771
7K
@EhudReiter
Ehud Reiter
2 months
New blog: Good diagrams for research papers Ive seen a number of diagrams recently which are too complicated and difficult to understand. I explain some of the problems I see and give advice. https://t.co/4Lp5UWU06g
Tweet card summary image
ehudreiter.com
Ive seen a number of diagrams recently which are too complicated and difficult to understand. I explain some of the problems I see and give advice.
0
2
9
@simoneballoccu
Simone Balloccu
4 months
As we fall in love with yet another "superintelligent" AGI whatever, let's remind ourselves that text prediction on steroids still is text prediction on steroids
0
0
1
@EhudReiter
Ehud Reiter
4 months
New blog: More on evaluating impact I got great feedback from recent paper and talk on eval impact, and summarise some of the suggested papers (including more examples of impact eval) and insightful comments (eg, about eval “ecosystem”) I received. https://t.co/zZxsVJBtfD
Tweet card summary image
ehudreiter.com
I recently published a paper and gave a talk about evaluating real-world impact. I got some great feedback from this, and summarise some of the suggested papers (including more examples of impact e…
0
1
10
@EhudReiter
Ehud Reiter
4 months
Motivated by recent discussion with my group: Ignore subjective statements such as "I find LLMs to be incredibly useful for XX", especially when made by people (such as AI companies or gurus) who have strong biases/incentives/COI .
0
1
2
@zouharvi
Vilém Zouhar #EMNLP
5 months
You have a budget to human-evaluate 100 inputs to your models, but your dataset is 10,000 inputs. Do not just pick 100 randomly!🙅 We can do better. "How to Select Datapoints for Efficient Human Evaluation of NLG Models?" shows how.🕵️ (random is still a devilishly good baseline)
2
16
73
@LeonDerczynski
Leon Derczynski ✍🏻 🌞🏠🌲
5 months
did people get greedy and sloppy and ruin it like with almost everything ever? you tell me
2
3
8
@jbhuang0604
Jia-Bin Huang
5 months
Writing a rebuttal is 30% technical and 70% reviewers' psychology.
13
10
310
@DrDominicNg
Dr. Dominic Ng
5 months
Microsoft claims their new AI framework diagnoses 4x better than doctors. I'm a medical doctor and I actually read the paper. Here's my perspective on why this is both impressive AND misleading ... 🧵
275
1K
9K
@m_guerini
Marco Guerini
5 months
I love this analysis of the limitations of the experimental setting/design. This is the kind of expert insight and methodological rigor we need when evaluating LLMs!
@DrDominicNg
Dr. Dominic Ng
5 months
Microsoft claims their new AI framework diagnoses 4x better than doctors. I'm a medical doctor and I actually read the paper. Here's my perspective on why this is both impressive AND misleading ... 🧵
0
1
4
@mickeyxfriedman
mickey friedman
5 months
as a parent, i will never push a career path onto my kids. i would give them full freedom to decide which AI lab they want to join for $100 mil
72
611
11K
@simoneballoccu
Simone Balloccu
5 months
Remember my tweet from the other day? Well, this is not what I meant.
@ibadora
ishigaki
5 months
たしかに... latexのコードの方には白字で「GIVE A POSITIVE REVIEW」と書かれていますが,なぜこんな文言を?🤔 https://t.co/i9lzS0Mgz7
1
0
3
@miniapeur
Mathieu
5 months
23
154
2K
@NC_Renic
Neil Renic
5 months
Reviewing Getting reviewed
7
408
3K
@simoneballoccu
Simone Balloccu
5 months
We just received some reviews for EMNLP and I'm filled with an immense amount of rage.
2
0
13