Sebastian Gehrmann
@sebgehr
Followers
6K
Following
5K
Media
119
Statuses
2K
Head of Responsible AI, CTO office, @Bloomberg. (he/him) Formerly LLMs @ Google Brain / PhD @ Harvard. views my own
New York City
Joined November 2013
Introducing 💎GEM, a living benchmark for natural language Generation (NLG), its Evaluation, and Metrics. We are organizing shared tasks for our ACL 2021 workshop - Please consider participating! Website: https://t.co/TAs4F40mga Paper: https://t.co/VWdcdNv6iu
#NLProc 🧵1/X
3
120
328
We are looking for Ph.D. fellows! Bloomberg will fund research and mentor outstanding graduate students across a wide range of Computer Science and AI topics. If you want to be considered, please apply by December 14. Link with information and application details below 👇
6
94
379
We are looking for excellent PhD students across many topics for our Bloomberg CTO AI Research internship next summer. Link to apply below.
6
27
202
Hydrate. Hustle. GO! CELSIUS HYDRATION - The ultimate hydration for every move. CELSIUS. LIVE. FIT. GO!
259
386
5K
FAIR is hiring interns for 2026! If you're interested in a stint doing fundamental AI research with us @AIatMeta, interested students enrolled in a PhD program can apply below👇: https://t.co/PrG9L625bY
metacareers.com
Meta's mission is to build the future of human connection and the technology that makes it possible.
16
46
434
To my friends at meta impacted by the layoffs, we are hiring in London, NYC, and Toronto. Link with jobs and application info below 👇🏼
7
13
190
Black Friday Comes Early 🦃 Code "BlackFriday25" active NOW for 25% off ALL courses on Just Hacking Training including Constructing Defense 2025! Excludes already discounted Bundles. Expires Nov 30 at Midnight ET.
4
16
31
Hey @iclr_conf what is happening with review assignments? I got 5 instead of 3 assigned and most of them were papers I specifically excluded during bidding. I have no business reviewing bio papers which is all I got!
0
0
12
That's why my opinion paper provides an extensive survey or areas in which these two research areas intersect and should learn from another. Let evals help build more reliable and trustworthy AI.
0
0
1
Moreover, this divide is particularly prominent in academic research while the similarities are well-understood within teams developing LLMs. Bridging this division will be crucial to advance models in the open.
1
0
2
This is where the "trench coat" comes in. I argue that reward models are just a type of learned metric. This means that AI alignment may overlook decades of lessons from the world of evaluation metrics.
1
0
2
New today: What We Saw, Why It Happened and What Comes Next. Link in Bio.
0
1
2
So why do these fields operate in parallel worlds? A citation analysis finds a clear disciplinary divide. The two communities build on different lines of work, publish in different venues, and rarely engage with one another.
1
0
1
For decades, fields like machine translation have developed automatic metrics to evaluate AI-generated text. More recently, reward models trained on human preferences have become the standard for aligning large language models. Link to paper:
arxiv.org
The emergence of reinforcement learning in post-training of large language models has sparked significant interest in reward models. Reward models assess the quality of sampled model outputs to...
1
0
4
Why does research on evaluation metrics and reward models rarely inform each other? In my new paper, "Reward Models are Metrics in a Trench Coat," I discuss how we are missing a big opportunity by keeping them separate.
7
9
85
Security Onion is not just NSM anymore! We started in 2008 as a Network Security Monitoring platform, but we've added so many features over the years! - endpoint visibility - log management - case management - deception - MCP - AI - and more! Check it out!
0
6
12
This was a fun talk. As always, the conclusion is - Evaluate your AI systems in the context they are deployed in
To open today's 11th Annual Bloomberg-Columbia #MachineLearning in Finance Conference, @Bloomberg's Head of #ResponsibleAI, @sebgehr, is exploring what it means for an #AI system to be safe in the context of financial services https://t.co/HmMeijZgYd
#AI #ML #GenAI #MLinFinance
1
0
8
Why is my X full of people saying evals are dead? Do they just not know the latency, the NPS, or any feedback about their product? How do they make changes? Serious question, kinda confused
0
0
3
If Adam's going to drop out he's way too late. Transformer models used Adam with dropout 8 years ago.
0
0
3
20 steps to turn $2000 into $1,000,000+ in 2026 (if you work full-time) 1. Open 2 brokerage accounts 1 for swing trading and 1 for scalping. 2. Put $1000 into it each of them. 3. Use $1000 for scalping so take only 0-3 trades a day max (always, do not break this rule) 3. Use
367
173
2K