GenBench
@GenBench
Followers
442
Following
104
Media
72
Statuses
193
State-of-the-art generalisation testing in NLP. Tag us for a RT of your NLP generalisation paper tweet!
Joined April 2022
The GenBench workshop is back! Do you work on generalisation (benchmarking) in #NLProc? Submit to the 2nd edition ( https://t.co/XqMMYRW8vQ) co-located with #EMNLP2024. We have a regular track and a β¨collaborative benchmarking task (CBT)β¨ that's fully LLM-focused this year (1/6)
genbench.org
The second workshop on generalisation (benchmarking) in NLP
1
12
22
That's a wrap! We (@glnmario, @christos_c, @_dieuwke_, @vernadankers, @khuyagbaatar_b, @a_kazemnejad & @ryandcotterell) thank all presenters, authors, reviewers and attendees!! The keynotes, the cats π», the posters, the talks and the lively panel: it was fantasticπ π₯
0
7
48
so proud of @HayleyRossLing for getting a best paper award at @GenBench this year!! ππͺ
π I'm sure @TeaAnd_OrCoffee would be too :) check out our paper and share if you think homemade cats are cats!
New paper with @najoungkim and @TeaAnd_OrCoffee testing if LLMs can draw adjective-noun inferences like humans! Turns out they often can, and even generalize to unseen combinations. But they're more optimistic about "artificial intelligence" than humans. https://t.co/u9RHG54HX7
1
6
60
0
2
15
Congratulations!
so proud of @HayleyRossLing for getting a best paper award at @GenBench this year!! ππͺ
π I'm sure @TeaAnd_OrCoffee would be too :) check out our paper and share if you think homemade cats are cats!
0
0
3
Come listen to the hot takes of our panelist in the Brickell room! Do we still need generalisation evaluation? π§ #GenBench2024 #EMNLP2024
0
4
15
Did you miss the GenBench poster session? Don't worry we've got you, here are (nearly all) posters! π #GenBench2024 #EMNLP2024 Next up: keynote by Sameer Singh at 3!
0
2
13
Last spotlight presentation: MMLU-SR: A Benchmark for Stress-Testing Reasoning Capability of Large Language Models https://t.co/4pyv01TbWE Unfortunately the authors couldn't make it, the work is kindly presented by their colleague Hengyi Wang π
0
1
1
Continuing with Bastian Bunzeck, presenting The SlayQA benchmark of social reasoning: testing gender-inclusive generalization with neopronouns https://t.co/70kDItm3BB
1
1
3
Next presenter is Jiwoo Lee, presenting MultiPragEval: Multilingual Pragmatic Evaluation of Large Language Models https://t.co/UW8x37AANT
1
0
0
Second up, Maxim Kurkim presenting OmniDialog: A Multimodal Benchmark for Generalization Across Text, Visual, and Audio Modalities https://t.co/cdanQ7RAnO
1
0
1
Spotlight time! Mirella Bueno on MLissard: Multilingual Long and Simple Sequential Reasoning Benchmarks https://t.co/ARmGeONz2c
1
1
3