valentina__py Profile Banner
Valentina Pyatkin Profile
Valentina Pyatkin

@valentina__py

Followers
3K
Following
7K
Media
58
Statuses
759

Postdoc at the Allen Institute for AI @allen_ai and @uwnlp

Zürich
Joined October 2016
Don't wanna be here? Send us removal request.
@adam_mahdi_
Adam Mahdi
7 days
Our new @NeurIPSConf paper: Measuring What Matters📄 We reviewed 445 LLM benchmarks from top AI conferences and found systematic weaknesses in: 1️⃣ Statistical rigour 2️⃣ Concept definition 3️⃣ Dataset construction blog + paper 👇 https://t.co/orubCJ3V8G
1
10
42
@turingmusician
Jonathan Bragg
6 days
Agent benchmarks don't measure true *AI* advances We built one that's hard & trustworthy 👉AstaBench tests agents w/ *standardized tools* on 2400+ scientific research problems 👉SOTA results across 22 agent *classes* 👉AgentBaselines agents suite 🆕 https://t.co/BFjdGCAp1w 🧵👇
Tweet card summary image
arxiv.org
AI agents hold the potential to revolutionize scientific productivity by automating literature reviews, replicating experiments, analyzing data, and even proposing new directions of inquiry;...
4
21
28
@rkdsaakyan
Arkadiy Saakyan
8 days
N-gram novelty is widely used as a measure of creativity and generalization. But if LLMs produce highly n-gram novel expressions that don’t make sense or sound awkward, should they still be called creative? In a new paper, we investigate how n-gram novelty relates to creativity.
1
13
44
@yanaiela
Yanai Elazar
9 days
This semester I’m teaching a seminar on data attribution. As researchers, it’s always gratifying when someone reads your paper, let alone an entire class! But we rarely get to hear about it. So this thread is a shoutout to the papers and authors we’ve read and discussed in class.
3
3
77
@clefourrier
Clémentine Fourrier 🍊
13 days
New HF bible just out!! Learn anything you need to train amazing LLMs (from the combined work of our science teams): data, pre-training, post-training, evals, infra, and way more! https://t.co/pRBAeA2FQn Congrats to the amazing @LoubnaBenAllal1 who led this effort! 🤩
2
4
19
@valentina__py
Valentina Pyatkin
13 days
oh no..
@soldni
Luca Soldaini 🎀
15 days
too many people saying “look at your data”, not enough “look at your model outputs”
0
1
10
@paul_rottger
Paul Röttger @ EMNLP
14 days
There’s plenty of evidence for political bias in LLMs, but very few evals reflect realistic LLM use cases — which is where bias actually matters. IssueBench, our attempt to fix this, is accepted at TACL, and I will be at #EMNLP2025 next week to talk about it! New results 🧵
@paul_rottger
Paul Röttger @ EMNLP
9 months
Are LLMs biased when they write about political issues? We just released IssueBench – the largest, most realistic benchmark of its kind – to answer this question more robustly than ever before. Long 🧵with spicy results 👇
1
6
20
@valentina__py
Valentina Pyatkin
15 days
more details and registration here:
Tweet card summary image
ai.ethz.ch
0
0
0
@valentina__py
Valentina Pyatkin
16 days
I will be giving a talk at @ETH_AI_Center next week, on RLVR for verifiable instruction following, generalization, and reasoning! 📢 Join if you are in Zurich and interested in hearing about IFBench and our latest Olmo and Tülu works at @allen_ai
3
10
103
@FSchaipp
Fabian Schaipp
16 days
What are good optimizers for diffusion models? 🍂 TLDR: Muon and SOAP are very good. Paper: https://t.co/TYqRpfcu5t
7
45
332
@mariusmosbach
Marius Mosbach
18 days
Come talk to @Ara_Krishnan and me about our recent paper on frequency effects of unlearning and how @allen_ai 's Olmo model and toolkit made this work so much easier. 🚀
@allen_ai
Ai2
18 days
Olmo isn’t just open weights—it’s an open research stack. Try it in the Ai2 Playground: https://t.co/qGd4UW8ALv AMA on Discord: Tues, Oct 28 @ 8:00 AM PT with some of the researchers behind these studies + an Ai2 Olmo teammate. Join: https://t.co/GnxLPhM3MW
1
6
19
@valentina__py
Valentina Pyatkin
20 days
Thank you to @Ale_Raganato for hosting me in Milano and for listening to me talk about verifiable constraints and RLVR!
0
1
33
@hamishivi
Hamish Ivison
20 days
Cool to see that Tinker has Tulu 3 SFT as an example in their cookbook :) https://t.co/D09igpMEJG
0
9
36
@yinghui_he_
Yinghui He
23 days
Claude Skills shows performance benefits from leveraging LLM skill catalogs at inference time. Our previous work (linked under thread 5/5) showed the same 6 months ago! 🌟Our new work, STAT, shows that leveraging skills during training can greatly help too‼️, e.g., Qwen can
8
42
198
@cervisiarius
Bob West
26 days
🚨New paper alert! 🚨 Tandem Training for Language Models https://t.co/Emzcgf1KHx Actions & thoughts of AI w/ superhuman skills will be hard for humans to follow, undermining human oversight of AI. We propose a new way to make AI produce human-understandable solutions. How?👉🧵
4
23
67
@m2saxon
Michael Saxon
26 days
𝑵𝒆𝒘 𝒃𝒍𝒐𝒈𝒑𝒐𝒔𝒕! In which I give some brief reflections on #COLM2025 and give a rundown of a few great papers I checked out!
5
24
145
@valentina__py
Valentina Pyatkin
27 days
Go work with Jonathan! I’m sure he’ll be a fantastic advisor!
@schwarzjn_
Jonathan Richard Schwarz
28 days
🎺 Big personal news: I've joined @imperialcollege as a Visiting Professor! 🎓Excited to collaborate with brilliant colleagues and students. If you're interested in a Machine Learning PhD, please reach out 📨 More exciting news to follow soon...
0
0
15
@benno_krojer
Benno Krojer
28 days
@tvergarabrowne and I were quiet over the summer with our podcast "Behind the Research of AI"... But now we're back! And with an awesome guest! We interviewed @jxmnop during @COLM_conf and had a blast chatting, eating snacks together and reflecting on phd life/research ideas
1
12
33
@valentina__py
Valentina Pyatkin
1 month
and that’s a wrap of COLM and SoLaR!
@valentina__py
Valentina Pyatkin
1 month
💡We kicked off the SoLaR workshop at #COLM2025 with a great opinion talk by Michelle Ding & Jo Gasior Kavishe (joint work with Victor Ojewale and @SureshVenkat46) on "Testing LLMs in a sandbox isn't responsible. Focusing on community use and needs is."
0
4
69
@valentina__py
Valentina Pyatkin
1 month
@tsvetshop Now poster session! 📰
0
1
1