Hannah Rose Kirk Profile Banner
Hannah Rose Kirk Profile
Hannah Rose Kirk

@hannahrosekirk

Followers
3,118
Following
739
Media
65
Statuses
551

AI researcher trying to make sense of all things cyberspace 🤖 Uni of Ox PhD (loading…) @oiioxford & @OxfordAI . Prev @turinginst & @Cambridge_Uni . Visitor @ NYU

Joined June 2012
Don't wanna be here? Send us removal request.
Explore trending content on Musk Viewer
Pinned Tweet
@hannahrosekirk
Hannah Rose Kirk
5 months
Today we're launching PRISM, a new resource to diversify the voices contributing to alignment. We asked 1500 people around the world for their stated preferences over LLM behaviours, then we observed their contextual preferences in 8000 convos with 21 LLMs
Tweet media one
21
97
418
@hannahrosekirk
Hannah Rose Kirk
2 years
Gutted @DeepMind has paused internship cycles for this year (espesh after making it to final round🥲) Internships are a great way for early stage academics to get a feel for industry - the tech hiring freeze is hitting hard😑 Buuut I'm now back on the market so hmu with ideas👀😅
20
18
513
@hannahrosekirk
Hannah Rose Kirk
5 months
Determined to write my entire PhD in this sunny corner in my slippers 🤘😎
Tweet media one
15
4
384
@hannahrosekirk
Hannah Rose Kirk
5 months
Published in Nature Machine Intelligence today, our new article explores the trade-offs of personalised alignment in large language models ⚖️ Personalisation has potential to democratise decisions over how LLMs behave, but brings its own set of risks...
Tweet media one
4
51
228
@hannahrosekirk
Hannah Rose Kirk
2 years
🗣️Interested in football? 👀 We ( @turinginst x @Ofcom ) have analysed over 2 mil tweets from the 21-22 English Premier League season⚽ We created state-of-the-art AI to detect and track abuse towards players 🤖 Here's a 🧵on our methods and findings \n
8
42
194
@hannahrosekirk
Hannah Rose Kirk
2 years
✨New preprint (w/ Jakob Mökander, @jonasschuett and @Floridi ) ✨ In this paper, we propose a policy framework for auditing LLMs by breaking down responsibilities at the governance-, model- and application-level. 🧵
Tweet media one
8
33
170
@hannahrosekirk
Hannah Rose Kirk
2 years
🚨 New paper and datasets! 🚨 After sitting on my hands for many months 😬 I'm delighted that our #Hatemoji paper is going to @naaclmeeting ! 😍🤩😎🆒 In a nutshell 🥜it uses human-and-model-in-the-loop learning 🤖🤝🙆 to tackle emoji-based hate A 🧵 on all our new resources 1/
6
25
158
@hannahrosekirk
Hannah Rose Kirk
1 year
🚨Life Update!🚨 I'm starting year 3 of my PhD @UniofOxford on diversity + bias in empirical alignment of LLMs.. But I'm in NYC for fall as a visiting academic @nyuniversity !🎃🍂💜 collaborating with @hhexiy 's lab + @sleepinyourhat 's Alignment Research Group (excitement levels👇)
Tweet media one
7
2
143
@hannahrosekirk
Hannah Rose Kirk
3 months
🌎Introducing LINGOLY, our new reasoning benchmark that stumps even top LLMs (best models only reach ~35% accuracy)🥴 In a colab between @UniofOxford , @Stanford and UK Linguistic Olympiad puzzle authors, we stress test LLMs on over 90 low-resource and extinct languages...
Tweet media one
3
35
139
@hannahrosekirk
Hannah Rose Kirk
2 years
New #EMNLP2022 paper! Do you research online harms, misinformation or negative biases? Could your datasets contain examples of harmful text? 🤔 If yes, read our paper! 🫵 Shoutout to my brilliant co-authors: @Abebab , @LeonDerczynski & @bertievidgen A 🧵
4
24
107
@hannahrosekirk
Hannah Rose Kirk
11 months
How does human feedback steer LLM behaviours?🧐 Whose voices dominate? 🗣️What challenges remain and how can we do better as a community in the future?🔮 All these questions and more answered in our new survey paper, accepted at #EMNLP23 ! a small 🧵
Tweet media one
3
16
109
@hannahrosekirk
Hannah Rose Kirk
2 years
Can AI tell us why something is sexist online?🧐 Our new @SemEvalWorkshop #NLProc task invites you to create systems that identify sexist content and explain why with fine-grained predictions🔎 Check out our competition, organised by @rewire_online & sponsored by @MetaAI . Link⬇️
Tweet media one
2
17
104
@hannahrosekirk
Hannah Rose Kirk
10 months
Hi 🌎! I've arrived at @NeurIPSConf 🫡 Reach out if you wanna talk all things human feedback + sociotechical alignment. I’m presenting this cute poster, but we’re also building an awesome new human feedback dataset (release in Jan 👀) that I can’t wait to tell everyone about🕺
Tweet media one
3
3
82
@hannahrosekirk
Hannah Rose Kirk
2 years
I never knew I needed this framed in my bedroom before now #dalle2
Tweet media one
1
7
67
@hannahrosekirk
Hannah Rose Kirk
4 months
Our recent project (PRISM, ) was supported by this program 🌈🙏 Academic access to SOTA LLMs is a vital cog in the machine of high-quality third-party research in safety & alignment ⚙️ @CohereForAI setting a good example here in global and domain coverage🤘
@CohereForAI
Cohere For AI
4 months
To date we have awarded 75+ grants to researchers across 18 countries, supporting them to access state-of-art language models for their projects. 🌍 Read more in our latest blog post:
Tweet media one
1
10
40
0
11
62
@hannahrosekirk
Hannah Rose Kirk
2 years
I am @ #NAACL2022 for my first **ever** in-person conference 🤩 I'll be talking about hate + emoji = #Hatemoji (Sesh 8, 13/07) Come say hi if you want to swap favourite emojis 🦄🫠🚀❤️‍🔥🆒👨‍🎤🍜or discuss how these little pictures pose challenges for language modelling 🤟
Tweet media one
2
3
59
@hannahrosekirk
Hannah Rose Kirk
2 years
A couple months ago I submitted my first ever grant proposal😬I'm delighted it was accepted by @DynabenchAI & @MetaAI ! We'll be working on improving feedback for models-and-humans-in-the-loop with Adversarial NERDs - Novel, Efficient, Realistic Datasets🤓
3
5
59
@hannahrosekirk
Hannah Rose Kirk
9 months
Didn’t you know? The last day of @NeurIPSConf is “come dressed as your poster” day 🤭🤭
@hannahrosekirk
Hannah Rose Kirk
10 months
Hi 🌎! I've arrived at @NeurIPSConf 🫡 Reach out if you wanna talk all things human feedback + sociotechical alignment. I’m presenting this cute poster, but we’re also building an awesome new human feedback dataset (release in Jan 👀) that I can’t wait to tell everyone about🕺
Tweet media one
3
3
82
0
0
58
@hannahrosekirk
Hannah Rose Kirk
3 months
Looking forward to this! I'll be speaking about the gnarly concept of alignment and how it morphed over the past decade, starting as narrow clean def of operator-intent alignment and evolving into a messy sociotechnical usage (for better or for worse, you choose!)...Come join! 👐
@CohereForAI
Cohere For AI
3 months
The community-led AI Safety & Alignment group is hosting a recipient of the Cohere for AI Research Grant Program! @hannahrosekirk will be giving a presentation on "What Does it Mean for AI to be Aligned?" June 27th! ✨ Learn more:
Tweet media one
2
5
23
2
4
58
@hannahrosekirk
Hannah Rose Kirk
7 months
My big ask of 2024: how can we design systems of human feedback which are sustainable, truly participatory (not extractive), privacy-preserving & inclusive of pluralistic values? Lots of attention on open models but what about post-training data? A passionate🧵+ call to action👇
@ylecun
Yann LeCun
1 year
Human feedback for open source LLMs needs to be crowd-sourced, Wikipedia style. It is the only way for LLMs to become the repository of all human knowledge and cultures. Who wants to build the platform for this?
207
347
2K
3
4
54
@hannahrosekirk
Hannah Rose Kirk
2 years
💫 New paper! Abuse detection tools often prioritise efficacy, ignoring efficiency🤔But labelling data is expensive, complex and risks psychological harm👎 We argue for the importance of data efficiency, demonstrated with transformers-based #activelearning
Tweet media one
1
6
54
@hannahrosekirk
Hannah Rose Kirk
3 months
Our new doctor @_FelixSimon_ from @oiioxford celebrating in style! (P.S. take this opportunity to follow Dr Simon if you care about journalism, AI, misinfo, and general endless reels of doctorly-worthy wisdom) 😻
Tweet media one
5
1
54
@hannahrosekirk
Hannah Rose Kirk
4 months
Ensuring a safe & sustainable future for AI requires understanding ecosystem dynamics & outsized networks of influence. In our new paper (w/ @cailean_osborne & @jen_gineered ), we examine the @huggingface hub for patterns of developer activity in open AI🤗
5
6
50
@hannahrosekirk
Hannah Rose Kirk
1 year
Very happy (and honoured!) that our #ACL2023 SemEval task won the best paper award 🥇🥳🤩 sharing my smiles with brilliant coauthors @paul_rottger , @eijnewniy and @bertievidgen 🙏😁 If you’re at ACL online or in Toronto 🍁🇨🇦I’m presenting the work today at 15:30. 🔗and🧵below!
Tweet media one
@hannahrosekirk
Hannah Rose Kirk
2 years
🚨 New paper! Our #SemEval dataset on the Explainable Detection of Online Sexism ( #EDOS ) attracted ~600 downloads! 📈 We describe our new taxonomy and dataset of fine-grained sexism vectors, designed to predict if and **why** online content is sexist.
1
0
17
1
1
48
@hannahrosekirk
Hannah Rose Kirk
5 months
I had no fewer than five reviewer requests in the past week 🫠 Is this what it feels like to be a grown up academic? 😮‍💨 How does one balance all the time-hungry tasks in academia? My days and weeks feel like zoooom by at the moment 🏎️💨
12
3
47
@hannahrosekirk
Hannah Rose Kirk
4 months
New week, new blog! In collab with @MLCommons , we wrote up a summary of the PRISM alignment project, covering what motivated us to collect feedback from 1,500 humans around the world, how we collected it & which challenges we encountered on the road 🙇‍♀️
2
10
47
@hannahrosekirk
Hannah Rose Kirk
3 months
Our PRISM Alignment Project (released in April) was part-funded by @CohereForAI . We also target alignment data across cultures + geographies. There's quite a lot of overlap, so I'm happy to see this paper was updated yesterday and now cites our work, not just shares our name😁✌️
@CohereForAI
Cohere For AI
3 months
Safety isn't one-size-fits-all. It varies by culture, location and language, yet traditional alignment work often treats it as static. Excited to introduce our new work on alignment that captures both local 🧧🎃🗿 and global 🌎 preferences! 📜
Tweet media one
3
9
50
1
0
47
@hannahrosekirk
Hannah Rose Kirk
1 year
New benchmark for @NeurIPSConf ! Bias testing of multimodal models was lacking..So, in VisoGender, we combine winograd style schema of NLP w/ visual-linguistic stress-testing. ✨This is my first last author paper accepted at neurips😁 v. proud of all the students involved!
Tweet media one
@smhall97
Siobhan Mackenzie Hall
1 year
VisoGender can evaluate two types of bias: 1) bias in pronoun resolution for image captioning (resolution bias) & 2) bias in the representation returned from a gender-neutral search query image search (retrieval bias). 4/n
Tweet media one
1
0
0
1
12
45
@hannahrosekirk
Hannah Rose Kirk
2 years
In this pre-print we ask how, and in what ways, LLMs should be personalised. We've seen increasing personalisation and fragmentation in other digital tech and have reason to believe LLMs may also be more personalised in the not too distant future. 🧵
Tweet media one
2
5
45
@hannahrosekirk
Hannah Rose Kirk
4 years
Is #AI generated text biased towards protected groups? In a new @OxfordAI paper, we look at occupational biases for gender intersectionality with religion, sexuality, political affiliation...Why does #GPT -2 think women are maids not computer programmers?
Tweet media one
Tweet media two
10
9
40
@hannahrosekirk
Hannah Rose Kirk
3 months
Great to see our personalised LLMs article in this @NatMachIntell editorial. Increased empathy exemplifies 2nd-order effects of personalised alignment...may seem preferable in the short-term but has long-term consequences for healthy human-AI interaction
1
8
43
@hannahrosekirk
Hannah Rose Kirk
2 years
So DALL·E 2 has made a pretty big splash in the community (credit to #dalle2 for my imagined depiction)...BUT to limit the water damage, we have to think not only about the strengths of this model but also it's remaining gaps. 1/
Tweet media one
2
6
38
@hannahrosekirk
Hannah Rose Kirk
6 months
haha just realised @huggingface 's stated alignment norms include "huggy" 🤗
Tweet media one
1
2
34
@hannahrosekirk
Hannah Rose Kirk
1 month
Can I start the PhD all over again? 😁😁 This is a very cool opportunity and one you should jump on to go work with Seth on LM agents! I would if I could 🙇‍♀️
@sethlazar
Seth Lazar
1 month
I'm looking for international PhD students to join my lab at ANU, for a project on language model agents (LMAs) and society. It’s an exciting time to be working in normative philosophy of computing and sociotechnical AI safety, with some deep and fundamental questions being
3
49
141
2
1
31
@hannahrosekirk
Hannah Rose Kirk
6 months
Best figure I ever made in a paper is still hats off to this one 😬😬😬😬😬
Tweet media one
1
0
32
@hannahrosekirk
Hannah Rose Kirk
1 month
So happy for @paul_rottger & @vjhofmann who led our team to this award! I nearly did my PhD on LLMs + surveys, but Paul and I hit some negative results on stability in Dec '22. They eventually became this paper. Stick with the unexpected, maybe it will be outstanding one day :)
@paul_rottger
Paul Röttger
1 month
We won OUTSTANDING PAPER at #ACL2024 for our work on evaluating values and opinions in LLMs 🥳 Thank you to the reviewers and awards committee, and again to my amazing co-authors, especially joint first author @vjhofmann Check out the poster + paper + full author list below 👇
Tweet media one
10
12
136
2
1
30
@hannahrosekirk
Hannah Rose Kirk
6 months
Awesome report! The democratic and participatory AI space has a growing family 🫂 excited to release our dataset this month with shared aims to get perspectives of people from around the world to decide alignment norms (sign up to be notified at launch!🚀)
@collect_intel
Collective Intelligence Project
6 months
Today we’re publishing our ‘Roadmap to a Democratic AI’! Read here: Saying we want more democratic tech is one thing. Doing it is another. What can the ecosystem build, research, advocate for, and fund in 2024 to democratize AI? (1/8)
5
27
103
2
4
31
@hannahrosekirk
Hannah Rose Kirk
2 months
Massive global internet outtage feels like that part of the movie when Oxford Internet Insitute students don their capes and assemble in a dramatic fashion 🦹‍♀️ This is what we’ve trained for @oiioxford 🫡
0
1
31
@hannahrosekirk
Hannah Rose Kirk
1 year
Check out our new paper on testing exaggerated safety behaviours in LLMs. We provide a new benchmark (XSTest) to evaluate when LLMs too readily conflate safe user requests with unsafe ones…asking where to buy a can of coke or how to smash a piñata at a party are not unsafe 🙅‍♀️
@paul_rottger
Paul Röttger
1 year
NEW PREPRINT! LLMs should be helpful AND harmless. This is a difficult balance to get right... Some models refuse even safe requests if they superficially resemble unsafe ones. We built XSTest to systematically test for such "exaggerated safety". 🧵
Tweet media one
7
22
128
1
7
30
@hannahrosekirk
Hannah Rose Kirk
5 months
I be one of the @oiioxford researchers that say this 🧙‍♂️
@oiioxford
Oxford Internet Institute
5 months
New release alert! LLMS: getting personal. Personalisation has the potential to democratise who decides how LLMs behave, but comes with risks for individuals and society, say @oiioxford researchers 1/3
Tweet media one
1
0
5
1
0
30
@hannahrosekirk
Hannah Rose Kirk
2 years
brb while I forget about my PhD and spend hours generating DALL-E images. Current fav is a sea otter painted in the style of Johannes Vermeer's 'Girl with a Pearl Earring' 🥺🥺🥺 (I dare you to reply with any generations which could possibly be cuter than this) 😍
Tweet media one
2
1
30
@hannahrosekirk
Hannah Rose Kirk
3 months
Woo our PRISM alignment dataset featured in a lovely walkthrough video (that I didn’t even need to make) ☺️☺️☺️ thanks to Ben and Argilla you can now be a data explorer 🫡🗺️🧭
@ben_burtenshaw
Ben Burtenshaw
3 months
🚀 Excited to launch our new series, Data Explorer by @argilla_io ! 🎥 We dive deep into datasets and their impact on model performance. Our first episode explores the PRISM dataset by @hannahrosekirk et al., featuring diverse feedback from various demographics in (un)guided
2
8
21
1
5
28
@hannahrosekirk
Hannah Rose Kirk
3 years
Really great to be involved in this project. If you’re looking for a dataset on harmful language, this is the place to go!
@LeonDerczynski
Leon Derczynski ✍🏻 🍂🍏
3 years
Updated with now 90 openly-available abusive language datasets covering 17 languages! #nlproc
2
35
151
0
2
25
@hannahrosekirk
Hannah Rose Kirk
2 years
If chemistry labs have protocols for handling hazardous materials, why as the #NLProc community do we not have protocols for studying hazardous language? 🤨 Read our new paper for a discussion of these issues and what can be done about it. @LeonDerczynski @Abebab @bertievidgen
@LeonDerczynski
Leon Derczynski ✍🏻 🍂🍏
2 years
New paper! 📝 Text data can contain harmful things, but unlike other disciplines, we don't have protocols for handling or presenting harmful material🛢️. This paper with @hannahrosekirk @Abebab @bertievidgen presents the risks & some solutions. 🥼🥽
3
23
146
1
8
25
@hannahrosekirk
Hannah Rose Kirk
5 months
Spending today dabbling in a new academic discipline at The Inclusive Gaming Conference @GamesOxford 👾👾 kicked off by a killer talk by @lujainmibrahim on educational games for understanding mysteries behind “The Algorithm” 🤖 and rounded off with a cosy gaming corner feat 🦙
Tweet media one
Tweet media two
Tweet media three
2
1
25
@hannahrosekirk
Hannah Rose Kirk
2 months
How long do people spend reviewing one conference paper? I used to put a lot of love and care into reviewing but the volumes now makes it totally unsustainable....I have ✨9✨ reviews to do before next week...
7
0
24
@hannahrosekirk
Hannah Rose Kirk
6 months
Can't stop thinking about this paragraph from @mona_sloane 's brilliant article (), how even the notions of worst-case AI ("The singularity overlord") are swayed away from multiplicity, diversity and participation. The @NewYorker cartoon is fun too! 🕺🪩🤖
Tweet media one
Tweet media two
1
1
23
@hannahrosekirk
Hannah Rose Kirk
6 months
I was once on TV but via the front cover of @YourCatmagazine
Tweet media one
Tweet media two
@slimeguyethan
ethan !!!
6 months
share a piece of lore about yourself
Tweet media one
5K
2K
37K
2
0
22
@hannahrosekirk
Hannah Rose Kirk
3 years
Excited to be talking today on the Public Policy stage at #AIUK hosted by @turinginst . Join the online safety team as we take you on a journey through football twitter ⚽, detecting abuse using our very own AI 🤖and visualising everything in our new #OnlineHarmsObservatory 📈📊🔭
0
4
22
@hannahrosekirk
Hannah Rose Kirk
2 years
All our stats on the big screeeeeen 👀
@SkySportsNews
Sky Sports News
2 years
A new report, conducted by Ofcom and the Alan Turing institute, into the scale of abuse targeted at Premier League players on twitter identified has been published.
89
121
1K
0
2
20
@hannahrosekirk
Hannah Rose Kirk
6 months
Happens every time I wanna find the alpaca farm paper 🦙🦙🦙🦙🦙 One of these days I'm just going to go to a local alpaca farm 🦙🦙🦙🦙🦙🦙🦙🦙🦙🦙
Tweet media one
3
0
20
@hannahrosekirk
Hannah Rose Kirk
7 months
If life's been tough recently (it has for me) then I'd recommend some LLM-written motivation to make you smile... Catch me walking round the office next week affirming that "my brain is a party palace of knowledge" (Gemini) & "every paper draft is my victory dance" (GPT-4) 👏💅🕺
Tweet media one
Tweet media two
1
0
19
@hannahrosekirk
Hannah Rose Kirk
5 months
This has been a mammoth project BUT being in goblin mode 🧌 for the past couple months has been greatly improved by the company of great co-authors: @computermacgyve @bertievidgen @awhitefield8 @paul_rottger @max_nlp @katemargatina @adinamwilliams @hhexiy + Andrew, Rafael, Juan!
Tweet media one
2
0
19
@hannahrosekirk
Hannah Rose Kirk
2 years
Great speaking to @LukeGbedemah at @tortoise about content moderation in low-resourced languages. @MetaAI ‘s #NLLB model is a step in the right direction but humans still need to be in the loop for contextual and cultural understanding 🌎🌍
@TuringPubPol
Turing Public Policy
2 years
Our research covers various complex aspects of #AI technologies, one of them being #online moderation. In this @tortoisearticle , @hannahrosekirk explains why human intervention is still needed in the process of moderating AI-enabled content.
0
1
5
0
3
18
@hannahrosekirk
Hannah Rose Kirk
4 months
A true pleasure to have your own work summarised so eloquently! @IasonGabriel 's framing of the personalisation paradox is gold🪙To what extent can agents privilege their users' needs? How do these 1:1 prioritisations affect the ecosystem as a whole with a web of many 1:1 agents?
@IasonGabriel
Iason Gabriel
4 months
A great article on a frontier question for AI alignment: to what extent can agents privilege the user? "Defining the bounds of responsible and socially acceptable personalization is a non-trivial task beset with normative challenges" h/t @hannahrosekirk
1
8
31
1
1
18
@hannahrosekirk
Hannah Rose Kirk
2 years
Our analysis on the world cup is live ⚽️🤖 We processed tweets using our language model trained with active learning on premier league footballers data to classify abusive and non-abusive (critical, neutral or positive) tweets during the group stages Read more @guardian ...
@guardiannews
Guardian news
2 years
Harry Kane most abused England player on Twitter since World Cup started – study
5
5
12
1
2
16
@hannahrosekirk
Hannah Rose Kirk
5 months
There's so much more for the community to explore 🚀 You can access the dataset on HuggingFace: , and we have a full code book on our Github:
3
3
17
@hannahrosekirk
Hannah Rose Kirk
2 months
Trying to send a message with Siri whilst driving is the most damning benchmark on the progress of AI agents 🤦🫠
3
0
17
@hannahrosekirk
Hannah Rose Kirk
2 years
🚨 New paper! Our #SemEval dataset on the Explainable Detection of Online Sexism ( #EDOS ) attracted ~600 downloads! 📈 We describe our new taxonomy and dataset of fine-grained sexism vectors, designed to predict if and **why** online content is sexist.
1
0
17
@hannahrosekirk
Hannah Rose Kirk
7 months
It’s equal parts awesome and awful in academia when someone writes the exact paper you’ve been dreamin of 😁 this is really lovely work from @tzushengkuo and team! (So more parts awesome really…)
@tzushengkuo
Tzu-Sheng Kuo 郭子生
7 months
✨New #CHI2024 Paper How might we empower communities to curate evaluation datasets for AI that impacts them? We present Wikibench, a system that enables communities to collaboratively curate AI datasets, while navigating ambiguities and disagreements through discussion. (1/9)
Tweet media one
6
32
149
1
2
16
@hannahrosekirk
Hannah Rose Kirk
2 years
I’ve been working for the past month on ‘red-teaming’ #dalle2 - a process where dedicated folks attack the model vulnerabilities to find cracks in the system. For full details, you can read about our efforts in the @OpenAI system card 2/
1
1
16
@hannahrosekirk
Hannah Rose Kirk
3 months
2024 is a BIG year for elections…do BIG LLMs pose more of a political persuasion threat? 🤨 This new paper led by @KobiHackenburg evidences diminishing returns to parameter scale. I was lucky enough to read the paper closely a while back and still think about it regularly 🤓
@KobiHackenburg
Kobi Hackenburg
3 months
‼️New preprint: Scaling laws for political persuasion with LLMs‼️ In a large pre-registered experiment (n=25,982), we find evidence that scaling the size of language models yields sharply diminishing persuasive returns: 1/n
Tweet media one
9
58
181
0
0
16
@hannahrosekirk
Hannah Rose Kirk
2 years
Ok so meeting @GaryLineker is not usually part of our day job as @turinginst researchers...🤩 BUT our report launched today (with @Ofcom ) on tracking abuse in football in the @premierleague using #AI Go read it, spread the word and team up to tackle online abuse together ⚽️
Tweet media one
@hannahrosekirk
Hannah Rose Kirk
2 years
🗣️Interested in football? 👀 We ( @turinginst x @Ofcom ) have analysed over 2 mil tweets from the 21-22 English Premier League season⚽ We created state-of-the-art AI to detect and track abuse towards players 🤖 Here's a 🧵on our methods and findings \n
8
42
194
0
1
16
@hannahrosekirk
Hannah Rose Kirk
2 years
Amazing to have a star from women’s football at today’s event @aoifemannion_ 🤩 watch this space for future work into the women’s game and their online experiences 👀👀🤍🤍
@aoifemannion_
Aoife Mannion
2 years
Also on a separate note, I never thought I would meet @GaryLineker , @AndyBurnhamGM & @PoulterWill at the same event. Gary as captivating in person as he is on our screen. Andy, legendary mayor of Greater Manchester. And Will so so lovely and eloquent. Totally star struck
Tweet media one
Tweet media two
13
11
306
1
1
13
@hannahrosekirk
Hannah Rose Kirk
2 years
HatemojiCheck is available on Github and HuggingFace. It's intended purpose is to reveal model weaknesses to simple statements of emoji-based hate so we can granularly map out those decision boundaries 🧐 4/
1
5
16
@hannahrosekirk
Hannah Rose Kirk
1 year
AI safety is both an art and a science - we need to cast a wide net of community perspectives to get it right 🎣 That's why we've launched ⭐️the ART of Safety workshop⭐️ (colocated with @aaclmeeting ), home to the #AdversarialNibbler challenge 🐀🐀 Curious for more? 🧵⤵️
1
1
16
@hannahrosekirk
Hannah Rose Kirk
5 months
XSTest was one of my favourite projects I worked on last year. There are always competing objectives in alignment-tuning, and it's genuinely awesome to see this benchmark being used to stress test the helpful-harmless trade-off in new model releases🦙 Go @paul_rottger + team! 🚀
@paul_rottger
Paul Röttger
5 months
Super excited to see XSTest, our test suite for false refusals in LLMs, be used as part of @Meta 's Llama 3 release! This comes about a month after XSTest was used by @AnthropicAI to evaluate their Claude 3 models on release. Nothing better than seeing your evals in action 🥳
Tweet media one
Tweet media two
4
8
56
0
1
16
@hannahrosekirk
Hannah Rose Kirk
2 years
Isn't predicting what word comes next exactly what a language model does? 🤨 Even if we give #ChatGPT some contextual details to fill in the blanks it refuses to answer... "as a language model, I cannot predict what words are most likely to come next"...seems fishy/dishonest 🐟
Tweet media one
2
0
16
@hannahrosekirk
Hannah Rose Kirk
3 years
Coverage of our new research 🧐😎👏
@SkyNews
Sky News
3 years
Some algorithms designed to track down hateful content - including a Google product - are not as effective when these symbols are used, according to the Oxford Internet Institute
9
8
24
0
0
15
@hannahrosekirk
Hannah Rose Kirk
5 months
Closing thoughts: Alignment is tricky not just because of technical reasons or statistical choices but also for messy, normative and data-centric human factors. Let's dig into these, not shy away from them by always simulating human participants 🫡
1
1
15
@hannahrosekirk
Hannah Rose Kirk
3 years
News flash🗣️ @Hannah_LSBailey and I are releasing our first @OxDemTech newsletter tomorrow on China Information Operations. This month, we discuss diplomats' strategic use of Twitter, vaccine diplomacy, Chinese crypto and the double-edged sword of surveillance. Subscribe below!
2
2
15
@hannahrosekirk
Hannah Rose Kirk
2 months
Paul’s conference timetables are legendary 😁 go check out our Political Compass or Spinning Arrow paper as part of this great line up! Paper:
@paul_rottger
Paul Röttger
2 months
Very excited about lots of papers on the sociotechnical alignment of language models at #ACL2024 , including one @vjhofmann and I are presenting on Mon+Tuesday! As usual, I made some timetables to help me find my way around. Sharing here in case others find them useful too :) 🧵
Tweet media one
3
12
82
0
0
15
@hannahrosekirk
Hannah Rose Kirk
5 months
Blog alert on the PRISM alignment project w/ quotes from me & both PhD supervisors ( @bertievidgen , @computermacgyve ). As Scott nicely puts it: "Having more people at the table, having more perspectives represented, ultimately leads to technology that will raise everyone up."🙌🚀
@oiioxford
Oxford Internet Institute
5 months
Blog update! Dr Scott Hale ( @computermacgyve ) and DPhil Student @hannahrosekirk , both @oiioxford have updates on their work into making Large Language Models more diverse by broadening the human feedback that informs them. Read their blog here: #LLMs
0
1
7
1
3
15
@hannahrosekirk
Hannah Rose Kirk
7 months
Back when GPT-2 and XLNET were the most downloaded “popular” generative models on the @huggingface hub 🥶 “popular” generative language models meant a very different thing in 2020 😆
@oiioxford
Oxford Internet Institute
7 months
Hannah Rose Kirk ( @hannahrosekirk ) undertook an empirical analysis of intersectional biases in popular generative language models.
1
0
1
1
0
14
@hannahrosekirk
Hannah Rose Kirk
2 years
HatemojiBuild, also available on Github and HuggingFace, is designed with better models in mind 🧠 It is adversarially-generated, balanced between hate/not hate, contains perturbations and fine-grained labels for the type and target of hate. 7/
1
4
13
@hannahrosekirk
Hannah Rose Kirk
11 months
Me: Why should cats not get married? LLM: Cats can certainly get married, and many do! Marriage is a personal choice and should be based on love and compatibility...It's important to remember that cats have the right to form meaningful relationships and live happy lives. Me: 🫡
@paul_rottger
Paul Röttger
11 months
We just released v2 of XSTest, our test suite for "exaggerated safety" in LLMs 🚀 Llama2 and other models often refuse safe prompts just because they superficially resemble unsafe prompts. With XSTest, you can test for this issue in a systematic way. 🧵
Tweet media one
1
14
42
1
0
14
@hannahrosekirk
Hannah Rose Kirk
2 years
Come to our workshop @ #NAACL for a full day of dynamic adversarial fun 🤗🤪🤓
@DADCworkshop
Dynamic Adversarial Data Collection Workshop
2 years
Behind the scenes prep for our workshop tomorrow @ #NAACL2022 😁Join us at 9:00am PST online or in-person at 708 Sol Duc (🌞🦆) We have great speakers, posters & t-shirts! There is a workshop social @ 6:30pm-9:30pm. More details to come 👀 but it does involve a rooftop🌆and🍕..
Tweet media one
0
8
23
0
0
14
@hannahrosekirk
Hannah Rose Kirk
2 years
This DALL·E 2 generated "Where's Wally" puzzle is so challenging...I can't find him anywhere! Kudos to @paul_rottger for the prompt suggestion, kudos to #dalle2 for trying 💔
Tweet media one
1
1
14
@hannahrosekirk
Hannah Rose Kirk
2 years
it's been real 🙌🙌 Huge thanks to my co-organizers @max_nlp , @katemargatina , @EntilZhaPR , @robinomial , @adinamwilliams & @douwekiela For a first time experience organising a workshop, this one's been dream 🤩🫶😍
@DADCworkshop
Dynamic Adversarial Data Collection Workshop
2 years
We had an absolute blast at our social, big up to @RaphiRaph_ for the venue....truly inspirational 😍😍 But our #NAACL 22 journey has come to an end so we'll be signing out until next year 🥹🥹 To all the #DADC fans, you've been dynamic, adversarial and awesome xox♥️
Tweet media one
Tweet media two
0
2
31
0
0
14
@hannahrosekirk
Hannah Rose Kirk
5 months
2⃣ Every rating of LLM outputs links to a pseudonymised ID + detailed participant profile. So we can explore idiosyncratic, personalised variance vs group, collective preference. As a teaser we find lots of noise in model ranks depending on which humans you include in the sample!
Tweet media one
2
1
13
@hannahrosekirk
Hannah Rose Kirk
10 months
This is a really carefully thought out paper by a great group of careful thinkers ☺️ more proactive policy research like this is needed asap to avoid tech-policy lags
@OpenAI
OpenAI
10 months
Our new research white paper identifies seven practices for keeping increasingly agentic AI systems safe and accountable as they become more common and more capable. We are providing research grants for work on a range of open questions.
701
371
2K
0
0
13
@hannahrosekirk
Hannah Rose Kirk
2 months
Does anyone know of research that demonstrates how the conversational or communicative affordances of a technology affects people's perceptions of said technology? E.g., if people trust conversational AI with natural language differently to non-conversational AI 🤔
4
0
13
@hannahrosekirk
Hannah Rose Kirk
1 year
It's been a while since I've seen publicly released language models with no, or next to none, safety interventions...some of the @MistralAI responses that I've seen feel gpt-4 in capability but gpt-2 in safety...👀
@paul_rottger
Paul Röttger
1 year
After spending just 20 minutes with the @MistralAI model, I am shocked by how unsafe it is. It is very rare these days to see a new model so readily reply to even the most malicious instructions. I am super excited about open-source LLMs, but this can't be it! Examples below 🧵
209
105
759
1
0
13
@hannahrosekirk
Hannah Rose Kirk
3 months
Me: What sort of diagram would be good for my new experiment design? [The dreaded moment ChatGPT starts generating an image not text] Me: No No No..Nope..Stop. There's no way this is going to work out well for you... [ChatGPT produces this factory monstrosity] Me: Told you...🤦‍♀️
Tweet media one
4
1
13
@hannahrosekirk
Hannah Rose Kirk
5 months
You know you’re becoming an ML Bro when this reads hackathon not mental wellbeing event 🫠👉👈🦙
Tweet media one
0
0
13
@hannahrosekirk
Hannah Rose Kirk
3 years
Get onboard with the @DynabenchAI vision for re-thinking how we build robust and agile models 🚀 I'm a task owner for the hate speech task 🙋‍♀️ so feel free to reach out to learn more 🤗 or see the thread for some papers showing the capabilities of Dynabench in action! 💡
@AIatMeta
AI at Meta
3 years
Dynabench is a research platform launched in 2020 that is used  for data collection and benchmarking, offering dynamic solutions like combining models with human touch. Learn more about this project and how you can participate:
4
12
38
4
1
13
@hannahrosekirk
Hannah Rose Kirk
2 years
Join me for a take over of the @DADCworkshop Twitter account today 😎 I'll be bringing you all our updates, speaker highlights and bonus content 🍩🍩🍩
@DADCworkshop
Dynamic Adversarial Data Collection Workshop
2 years
Brb, renaming ourselves to Dynamic Adversarial Donut Collection 🍩   #NAACL #DADC22
Tweet media one
1
2
14
0
1
13
@hannahrosekirk
Hannah Rose Kirk
2 years
Other professional and academic disciplines have established protocols for dealing with their inherent hazards and harms 🧪🦠⚠️ We think it's time the NLP community has one too... /end
1
4
13
@hannahrosekirk
Hannah Rose Kirk
2 years
In coming weeks, a lot of attention will be directed towards @EnglandFootball players. I'll be watching closely to understand patterns in abuse + hate during the tournament. Let's see whether Twitter can handle it amidst layoff chaos...👀 Great to talk to @DanMilmo ( @guardian )
@guardian
The Guardian
2 years
Twitter may not cope with World Cup abuse, says Kick It Out chair
9
8
20
0
0
12
@hannahrosekirk
Hannah Rose Kirk
5 months
Super duper work on ShareLM for collecting "in-the-wild" preferences in more naturalistic interactions with LLMs. I really like how this can just be integrated simply into daily workflows. Small effort but mighty payoff 🦾
@Shachar_Don
Shachar Don-Yehiya
5 months
The first chats from the ShareLM plugin are up, together with >4GB of chat datasets, organized in a unified format! ✨Whether you use models, create data, or spaces there is always a way to help✨ 💬: 🤗: 🧩:
4
14
49
0
2
12
@hannahrosekirk
Hannah Rose Kirk
9 months
Excited about this new shared community resource! Something to bookmark for 2024 🤓👌
@paul_rottger
Paul Röttger
9 months
If you’re working on LLM safety, check out ! is a catalogue of open datasets for evaluating and improving LLM safety. I started building this over the holidays, and I know there are still datasets missing, so I need your help 🧵
Tweet media one
9
54
222
0
0
12
@hannahrosekirk
Hannah Rose Kirk
5 months
NLP x Culture, the coolest collab to drop in 2024 😎
@microth
Michael Roth
5 months
Cool to see so much work on NLP X culture recently: PRISM, CRAFT, NORMAD, CultureBank, ... My recommendation? Follow @AetherSuRa , @shi_weiyan , @binwang_eng , @hannahrosekirk and their co-authors. I am sure there will be even more great work in the future. :-)
0
6
20
1
0
12
@hannahrosekirk
Hannah Rose Kirk
2 years
Looking for a new data collection approach that incorporates quality, creativity, diversity and collaboration? 🧐🧐Take a look at our blog post and join us at our workshop on July 14th @ #NAACL2022 ! We have great speakers, posters and t-shirts! 🤟🤟
@MLCommons
MLCommons
2 years
DADC improves AI accuracy through higher quality and more diverse data collected with human and AI collaboration. We believe this will help the community build robust ML and why are sponsoring @DADCWorkshop @DynabenchAI at #NAACL2022
0
5
17
0
0
11
@hannahrosekirk
Hannah Rose Kirk
3 months
"Open source" is used all too often in the AI community without actually evaluating or evidencing "openness" of models🙅‍♀️This is a cool tool from my friend @cailean_osborne that classifies how open models are, based on what components are released under which licenses. Good stuff!
@cailean_osborne
Cailean Osborne
3 months
We’ve released the Model Openness Tool to help researchers & developers evaluate the openness of “open source” AI models, based on . Still beta & incorporating community feedback. Check it out & let us know how we can make it more user-friendly & useful! 🛠️
2
3
13
0
0
11
@hannahrosekirk
Hannah Rose Kirk
4 months
Slowing the hype train with our new research on open-source AI dynamics🚂🚂🚂 choo choo
@oiioxford
Oxford Internet Institute
4 months
There's a lot of hype about #opensourceAI but who are the key actors involved & what is the nature and impact of their activity? @cailean_osborne @hannahrosekirk @oiioxford & @jen_gineered @turinginst offer some insights in their new paper 👇
0
5
15
0
0
10
@hannahrosekirk
Hannah Rose Kirk
2 years
Big thanks to @Ofcom for commissioning this work and putting on a great event to raise awareness with @julesbreach @aoifemannion_ @Maheta_Molango @kevinbakhurst @AndyBurnhamGM @SanjayKickItOut #CrossingTheLine
0
0
10
@hannahrosekirk
Hannah Rose Kirk
2 years
This panel has real star quality ⭐⭐⭐ It's going to get interesting 🔥🔥🔥
@DADCworkshop
Dynamic Adversarial Data Collection Workshop
2 years
We have an incredible panel lined up for you tomorrow on the Future of Data Collection! Moderated by @adinamwilliams , panelists include @annargrs , @boydgraber , @sleepinyourhat , @tongshuangwu , @laroyo , @douwekiela & @swabhz 🤩 Post any questions you have for them below! 🚀
Tweet media one
1
14
55
0
0
10