Max Bartolo Profile Banner
Max Bartolo Profile
Max Bartolo

@max_nlp

Followers
2,080
Following
645
Media
51
Statuses
630

I lead the Command modelling team at @Cohere and co-chair the @DynabenchAI @MLCommons working group. Prev @DeepMind , @MetaAI / FAIR & @BloomsburyAI .

Joined November 2016
Don't wanna be here? Send us removal request.
Explore trending content on Musk Viewer
@max_nlp
Max Bartolo
26 days
OMG @karpathy retweeted our work. Somebody pinch me 🤯❤️
@karpathy
Andrej Karpathy
26 days
Nice new read on tokenization! You've heard about the SolidGoldMagikarp token, which breaks GPT-2 because it was present in the training set of the Tokenizer, but not the LLM later. This paper digs in in a lot more depth and detail, on a lot more models, discovering a less…
47
397
3K
13
9
292
@max_nlp
Max Bartolo
2 years
Super excited to announce that I'm now @CohereAI ! 🥳 I'm convinced that LLMs will create tremendous value in the next few years, and Cohere is a fantastic place to be to help contribute (and the people are awesome)! 😄
Tweet media one
18
11
250
@max_nlp
Max Bartolo
1 year
🚨 Internship Alert! 🚨 Interested in doing research on cutting edge large language models @CohereAI ? Apply here:
0
36
245
@max_nlp
Max Bartolo
5 years
"The future is not about bigger models, it's about bigger ideas" - Phil Blunsom @eurnlp
Tweet media one
1
41
183
@max_nlp
Max Bartolo
2 years
Excited to start as a research scientist intern @DeepMind today. Looking forward to working with @huangposen & @Johannes_Welbl ! I'm fortunate enough to be there in person so ping me if you're around and want to chat or grab a coffee 🚀
9
6
181
@max_nlp
Max Bartolo
2 years
Data collection is slow and expensive, so we give annotators a little help 🤝. Introducing Generative Annotation Assistants (GAAs) to make data collection more efficient and effective 🚀. Work will be presented at #NAACL2022 in Seattle! Paper: [1/n]
Tweet media one
3
28
115
@max_nlp
Max Bartolo
2 years
This Valentine's day, to celebrate our love for dynamic adversarial data, the DADC ( @NAACLmeeting '22) workshop is announcing our first call for papers. We would love for you to join us: ❤️
2
14
93
@max_nlp
Max Bartolo
3 months
Incredibly proud of our world class team driving continuous improvements @cohere and delivering a best-in-class RAG-powered, long-context-capable, multilingual LLM available to the research and dev communities. Try it out at & 🚀
@aidangomez
Aidan Gomez
3 months
⌘-R Introducing Command-R, a model focused on scalability, RAG, and Tool Use. We've also released the weights for research use, we hope they're useful to the community!
31
195
1K
4
5
93
@max_nlp
Max Bartolo
4 years
Beat the AI 🤔🆚🤖 Investigating Adversarial Human Annotation for Reading Comprehension (in TACL, ) w/ @ARoberts9 @Johannes_Welbl @riedelcastro & Pontus Stenetorp will be presented at #emnlp2020 . Data & leaderboard also available: 1/N
1
25
93
@max_nlp
Max Bartolo
2 months
Just 3.5wks after launching Command R, we are excited to release Command R+. It is bigger, better, bolder and goes where no Command model has gone before. It is the result of months of hard work by the incredible team @Cohere , and we're releasing model weights for you to use at…
@aidangomez
Aidan Gomez
2 months
⌘R+ Welcoming Command R+, our latest model focused on scalability, RAG, and Tool Use. Like last time, we're releasing the weights for research use, we hope they're useful to everyone!
26
190
986
2
8
84
@max_nlp
Max Bartolo
3 years
Excited to share that our work on Improving QA Model Robustness with Synthetic Adversarial Data Generation w/ @TristanThrush , @robinomial , @riedelcastro , Pontus Stenetorp, @douwekiela at @ucl_nlp & @facebookai will be presented at #EMNLP2021 Paper: [1/n]
Tweet media one
2
22
81
@max_nlp
Max Bartolo
2 months
Building best-in-class LLMs requires a rare combination of intuition, resources and an insanely talented team. Intuition can guide you, but you never really know where you're going to end up. So stoked to see how well our models have been received. Thank you for your support! 🙏
Tweet media one
Tweet media two
2
10
79
@max_nlp
Max Bartolo
2 years
Announcing what we hope will be one of the best AI/NLP competitions in 2022: the DADC Shared Task () @NAACLmeeting in Seattle 🏆! We have 3 awesome tracks for you to participate in (two data-centric & one model-centric). Details below 👇 [1/n]
2
29
72
@max_nlp
Max Bartolo
4 years
Excited to start as a research intern at @facebookai with @douwekiela , @riedelcastro and an amazing group of people working on a super cool project! ❤️
1
0
67
@max_nlp
Max Bartolo
28 days
📢 Glitch Tokens Detected 📢 Tokens are the building blocks of LLMs -- but there's a problem! Tokenizers and LLMs aren't trained on perfectly identical or static corpora, meaning that tokenizers and models are often out of sync, leading to unseen 'glitch tokens' that can make…
@magikarp_tokens
Sander Land
28 days
Our paper about reliably finding under-trained or 'glitch' tokens is out! We find up to thousands of these tokens in some #LLMs , and give examples for most popular models. More in 🧵
Tweet media one
15
160
859
2
13
58
@max_nlp
Max Bartolo
2 years
I'll be presenting our work introducing Generative Annotation Assistants making data collection more efficient and effective in-person tomorrow (13/07) at #NAACL2022 at 10:45PST (9D): . Join us! 🚀 w/ @TristanThrush @riedelcastro @robinomial @douwekiela
Tweet media one
0
13
57
@max_nlp
Max Bartolo
11 months
I'm in Toronto 🇨🇦 for #ACL2023NLP #ACL . DM me if you want to chat!
Tweet media one
3
3
57
@max_nlp
Max Bartolo
2 years
@douwekiela from @huggingface will be with us in-person speaking about "Improving Multimodal Evaluation and Exploring Foundational Language and Vision Alignment" for the @ucl_nlp meetup at @ai_ucl on Wed, Jun 29th at 18:30. Join us: 🚀
Tweet media one
0
7
50
@max_nlp
Max Bartolo
3 years
Super excited to announce that our proposal has been accepted and The First Workshop on Dynamic Adversarial Data Collection (DADC) will take place at #NAACL2022 @aclmeeting in Seattle 🇺🇸! #NLProc Stay tuned, this is going to be fun! 🚀
3
12
53
@max_nlp
Max Bartolo
2 years
This has to be one of the best poster session venues I've been to! 🪴 #AKBC2022 @AKBC_conf
Tweet media one
0
2
52
@max_nlp
Max Bartolo
3 months
Time for #EACL2024 in Malta. DM me if you want to chat about research, @cohere , best places to get pastizzi in Malta, or anything in between!
Tweet media one
1
3
48
@max_nlp
Max Bartolo
5 years
I'm happy to announce the start of my PhD at @UCLMR with and @riedelcastro . Great to be working alongside former colleagues @_rockt @PSH_Lewis @backprop2seed as well as @PMinervini @Johannes_Welbl @mindjimmy & @egrefen . Looking forward to the next 3 years!
6
3
50
@max_nlp
Max Bartolo
1 month
In Vienna 🇦🇹 for @iclr_conf #ICLR2024 . DM if you're up for a coffee!
Tweet media one
2
2
49
@max_nlp
Max Bartolo
2 years
Really excited to hear about the interest in our work on Improving QA Model Robustness with Synthetic Adversarial Data Generation () both from the research community and industry. To facilitate this, we're sharing our synthetic data & question generators!🥳
2
14
47
@max_nlp
Max Bartolo
5 months
Extremely insightful work by @tomhosking digging into what human feedback actually measures accepted @iclr_conf '24. 🚀 DM me if you're interested in learning more or if you're excited about exploring the limits of what we know about LLMs with a research internship @cohere !
@tomhosking
Tom Hosking
5 months
"Human Feedback is not Gold Standard" was accepted at ICLR 2024 🥳 I'd love to chat about the limits of human feedback wrt LLM alignment (and about @cohere ) if you're going to be at the conference! 🇦🇹 Thanks again to @max_nlp for making it an awesome internship experience ❤️
5
26
188
1
3
45
@max_nlp
Max Bartolo
6 months
I'm sorry, but someone needs to say this. DROP was one of the most thoughtfully created, insightful and novel datasets of its time. Anybody using a dataset for eval, particularly with a new family of models, is responsible for basic postprocessing. Don't blame the dataset.
@clefourrier
Clémentine Fourrier 🍊
6 months
⚠️ We are removing DROP from the Open LLM Leaderboard! With leaderboard evaluation data openly shared on 2000+ models, we did a deep dive with our friends @AiEleuther and @try_zeno , & found out that its original implementation is unfair to many models 😱
8
31
154
1
4
43
@max_nlp
Max Bartolo
1 month
Human preference is complex, multi-dimensional and personal. This work is a treasure-trove of information. An absolute must read (at least twice) for anyone working with LLMs or generative AI systems that rely on human feedback 🤖🙋
@hannahrosekirk
Hannah Rose Kirk
1 month
Today we're launching PRISM, a new resource to diversify the voices contributing to alignment. We asked 1500 people around the world for their stated preferences over LLM behaviours, then we observed their contextual preferences in 8000 convos with 21 LLMs
Tweet media one
21
98
409
1
5
41
@max_nlp
Max Bartolo
15 days
Introducing Aya 23, new SOTA multilingual models just dropped
@CohereForAI
Cohere For AI
15 days
Today, we launch Aya 23, a state-of-art multilingual 8B and 35B open weights release. Aya 23 pairs a highly performant pre-trained model with the recent Aya dataset, making multilingual generative AI breakthroughs accessible to the research community. 🌍
7
112
438
0
7
38
@max_nlp
Max Bartolo
7 months
Coral still knows.
Tweet media one
1
2
37
@max_nlp
Max Bartolo
8 months
It's been 1yr @Cohere already! Among the many exciting things we're working on, one I'm particularly excited about is using models in the loop for better data. If you're not using models in the loop to make your data collection efforts more effective, you're missing out!
@max_nlp
Max Bartolo
2 years
I'll be presenting our work introducing Generative Annotation Assistants making data collection more efficient and effective in-person tomorrow (13/07) at #NAACL2022 at 10:45PST (9D): . Join us! 🚀 w/ @TristanThrush @riedelcastro @robinomial @douwekiela
Tweet media one
0
13
57
1
2
37
@max_nlp
Max Bartolo
3 years
How well do your RC models perform on more challenging questions? adversarialQA () is now available for easy access in @huggingface datasets!
1
11
35
@max_nlp
Max Bartolo
2 years
Made it to Seattle 🇺🇸 for #NAACL2022 ! Looking forward to my first in-person conference since 2019 🇮🇹. Ping me if you want to chat research or anything else! 🚀
Tweet media one
1
0
35
@max_nlp
Max Bartolo
3 years
Just noticed that #AdversarialQA is the 4th most downloaded QA dataset at 🤗 with nearly 25k downloads. Super excited to see what everyone's been working on! 🥳
Tweet media one
0
6
33
@max_nlp
Max Bartolo
22 days
The Aya Dataset paper coming soon to an ACL near you! 🥳 Massive congrats to all collaborators and the fantastic community who contributed to this open resource powering SOTA multilingual capabilities 🔥
@CohereForAI
Cohere For AI
22 days
🌱 We’re very excited that our work "Aya Dataset: An Open-Access Collection for Multilingual Instruction Tuning" was also accepted! Congrats to authors @singhshiviii , @freddie_v4 , @mrdanieldsouza , @tellarin , @freakynut , @weiyinko_ml , @krypticmouse , @rv__init__ , @DeividasMat ,
Tweet media one
1
22
64
1
6
31
@max_nlp
Max Bartolo
16 days
Evaluation is one of the most important research areas in the LLM space. As the saying goes, you can't improve what you don't measure
@clefourrier
Clémentine Fourrier 🍊
16 days
I discovered at ICLR 2024 that a lot of what I take for granted about LLM evaluation is actually not that widely known... So I made a blog! - how do we do currently do LLM evaluation? ⚖️ - most importantly, what is it actually useful for? 🤔
10
76
369
0
5
30
@max_nlp
Max Bartolo
2 months
⌘R+ is now the default model powering HuggingChat. Try it out! 🚀
@victormustar
Victor M
2 months
New on HuggingChat: ✨Cohere Command R+ Chat with it here:
Tweet media one
9
42
158
0
5
29
@max_nlp
Max Bartolo
4 years
Call for papers for our #NeurIPS2020 workshop HAMLETS: Human And Model in the Loop Evaluation & Training Strategies is now live! Speakers include @jennwvaughan , @ajratner , @EmmaBrunskill , Sanjoy Dasgupta, Dan Weld, Kristen Grauman & Finale Doshi-Velez. See
0
7
27
@max_nlp
Max Bartolo
2 months
Rerank 3 is 🔥🔥🔥
@cohere
cohere
2 months
Introducing Rerank 3: our newest foundation model purpose built to enhance enterprise search and Retrieval Augmented Generation (RAG) systems, enabling accurate retrieval of multi-aspect and semi-structured data in 100+ languages.
6
88
460
0
2
26
@max_nlp
Max Bartolo
6 years
We'll be presenting the #ShARC 🦈 dataset () tomorrow at the #emnlp2018 Grand Hall poster session between 09:00-10:30. Joint work with @marzieh_saeidi @PSH_Lewis @sameer_ @_rockt @mikesheldon @gbouchar @riedelcastro . See you there! 🦈
0
6
26
@max_nlp
Max Bartolo
1 month
Tweet media one
0
1
25
@max_nlp
Max Bartolo
4 years
Check out our recent work on undersensitivity in Reading Comprehension and investigation of generalisable adversarially-robust training, achieving new SOTA on AddSent and AddOneSent w/ @Johannes_Welbl @PMinervini @riedelcastro
@PMinervini
Pasquale Minervini 🚀 hiring postdocs!
4 years
Q: What adversarial failure mode do reading comprehension chopsticks suffer from? A: Undersensitivity! (confidence: 99.7%) 🙂🥢 -- more in our "Undersensitivity in Neural Reading Comprehension" (), by @Johannes_Welbl @max_nlp Pontus and @riedelcastro
Tweet media one
3
15
61
1
4
25
@max_nlp
Max Bartolo
3 months
Some annotator somewhere is having a good laugh right about now
@alexalbert__
Alex Albert
3 months
Fun story from our internal testing on Claude 3 Opus. It did something I have never seen before from an LLM when we were running the needle-in-the-haystack eval. For background, this tests a model’s recall ability by inserting a target sentence (the "needle") into a corpus of…
Tweet media one
589
2K
12K
1
1
25
@max_nlp
Max Bartolo
8 months
We're live! Check out ! 🚀
@cohere
cohere
8 months
We are excited to announce that our Chat API with Retrieval-Augmented Generation (RAG) is now available in a public beta. The API is powered by Command, Cohere’s flagship generative LLM.
199
543
6K
1
0
25
@max_nlp
Max Bartolo
3 years
Grateful that @sleepinyourhat could visit us @ucl_nlp yesterday to share insights on navigating hype in #NLProc . I was curious whether the Table 1 () trends held for our best @DynabenchAI QA models, so I ran some experiments. Turns out that... [1/n]
Tweet media one
@sleepinyourhat
Sam Bowman
3 years
You'll sometimes see the meme that NLP is solved. That's hype, and it's doing harm in the real world. But it's worth thinking about what it'd look like to actually achieve what we're aiming for. (📄 paper, thread 🧵)
Tweet media one
10
106
536
2
10
24
@max_nlp
Max Bartolo
2 months
Command R+ available first on Microsoft Azure and on our platform, and coming very soon to all major cloud providers 🔥
@satyanadella
Satya Nadella
2 months
Azure will be the first cloud to offer Cohere's latest LLM, as we build on our commitment to offer customers the broadest selection of state of the art and open source models.
96
222
2K
0
0
24
@max_nlp
Max Bartolo
2 months
Performance also != Chatbot Arena Elo. But a massive improvement over previous plots! 🔥 The main takeaway for me here is that we need significantly better evals that reflect the real-world value created by LLMs
@maximevoisin_ai
Maxime Voisin
2 months
👀
Tweet media one
17
62
369
0
3
23
@max_nlp
Max Bartolo
3 years
#EMNLP2021 want to know more about our work on Improving QA Model Robustness with Synthetic Adversarial Data Generation @ucl_nlp & @facebookai ? Come say hi on !
1
4
23
@max_nlp
Max Bartolo
4 months
If you're doing any synthetic data gen, I'd encourage you to explore synthetic adversarial data generation (e.g. ). Synth data can help fill gaps around existing data (~fancy paraphrasing) but it can also help elevate model capabilities!
2
2
22
@max_nlp
Max Bartolo
4 years
Dynabench from @facebookai in collaboration with @UCL_NLP , @stanfordnlp and @UNCNLP is now live! Can you ask questions that the models can't answer correctly? For more info on the QA task see w/ @ARoberts9 @Johannes_Welbl @riedelcastro and Pontus Stenetorp
@douwekiela
Douwe Kiela
4 years
I’m super excited to announce Dynabench - a new and ambitious research platform for dynamic data collection and benchmarking: 1/n
9
117
460
0
6
22
@max_nlp
Max Bartolo
2 years
Honoured that our work investigating the sensitivity of Large Language Models to prompt sample ordering has been selected as an #ACL2022 outstanding paper! We also manage to find better orderings automatically without relying on held-out examples.➡️ 🚀
@yaolu_nlp
Yao Lu
2 years
Excited to receive an ACL outstanding paper award, with @max_nlp @latticecut @riedelcastro @ucl_nlp ! TL;DR If prompting is not working, change the order, the performance may jump from random-guess to SOTA. How to find fantastically ordered prompts? Here➡️
Tweet media one
5
32
206
2
2
22
@max_nlp
Max Bartolo
3 years
Exciting work further confirming that adversarial data collection leads to harder examples, and highlighting important points on annotation quality and the potential benefits of using models in the dataset creation loop 🤔🤖
@zhansheng
Jason Phang
3 years
NLP benchmarks are increasingly saturated, making it difficult to measure further improvements in models. What if we used adversarial filtering to identify the most challenging *evaluation* examples, and build benchmarks based on them? 🧵1/x
Tweet media one
2
15
101
0
3
21
@max_nlp
Max Bartolo
6 years
Great to see our #emnlp2018 paper "Interpretation of Natural Language Rules in Conversational Machine Reading" featured in this week's @seb_ruder newsletter ! @riedelcastro @marzieh_saeidi @PSH_Lewis @sameer_ @_rockt @gbouchar @mikesheldon
0
5
21
@max_nlp
Max Bartolo
2 years
#ChatGPT 's greatest contribution by far is the (mostly adversarial) data it enables @OpenAI to collect 📈
1
0
20
@max_nlp
Max Bartolo
3 months
So trendy 🔥
@CohereForAI
Cohere For AI
3 months
Less than 24 hours after release, C4AI Command-R claims the #1 spot on the Hugging Face leaderboard! We launched with the goal of making generative AI breakthroughs accessible to the research community - so exciting to see such a positive response. 🔥
Tweet media one
2
21
138
0
3
20
@max_nlp
Max Bartolo
19 days
So the people have lost interest in asking "draw an ASCII unicorn"-style questions? Would love to see a much deeper and more granular analysis of Chatbot Arena prompts and what real-world utility these correlate with
@soumithchintala
Soumith Chintala
21 days
it is interesting that GPT4-o's ELO is lower at 1287, than its initial 1310 score. On coding, it regressed even more absolute points, from 1369 to 1307.
Tweet media one
12
20
243
2
3
20
@max_nlp
Max Bartolo
2 years
I may be biased but @CohereAI embeddings are the best embeddings 🔥
@Julian_Risch
Julian Risch
2 years
Another Haystack release is out! Among other highlights, v1.11 supports @CohereAI embeddings in document retrieval! 🎉
0
1
25
1
0
19
@max_nlp
Max Bartolo
24 days
Command R topping the Open Arabic LLM Leaderboard. Maltese is heavily influenced by Arabic so particularly excited to see progress towards models that will eventually speak my language! 🤩🇲🇹
@clefourrier
Clémentine Fourrier 🍊
24 days
New on the hub: Arabic LLM Leaderboard! Arabic has at least 380M speakers & is one of the most spoken languages... but how good are LLMs at it? @alielfilali01 contacted @TIIuae and @huggingface to know, and collaborate around a new leaderboard!
2
17
68
1
3
17
@max_nlp
Max Bartolo
2 years
All the SOTA multimodal models we tested perform poorly on the specifically-constructed Winoground evaluation set for visio-linguistic compositional reasoning. Check it out! 👇
@TristanThrush
Tristan Thrush
2 years
Happy to announce our new CVPR paper - Winoground: Probing Vision and Language Models for Visio-Linguistic Compositionality. All tested SOTA multimodal models perform very poorly on our new vision-language eval dataset. Paper: #CVPR2022 , #NLProc 1/5
8
44
247
0
3
18
@max_nlp
Max Bartolo
2 years
QA in #NLProc is not yet solved! Can you come up with questions that an AI finds challenging while contributing to what may become the goto QA benchmark eval set? All for just 100 questions (~1.5hrs)! Sign up at P.S. Prizes available! 🏆 Please share! 🚀
@DADCworkshop
Dynamic Adversarial Data Collection Workshop
2 years
Track 1 of our Shared Task kicks off on Mon 2nd May. Do you have a knack for coming up with challenging QA examples? Prove it while competing against the AI and your peers! We will also have prizes thanks to @commons_ml . Sign up at 🚀 #NLProc #NAACL2022
0
3
11
1
13
17
@max_nlp
Max Bartolo
5 years
Exciting new dataset combining reading comprehension with discrete reasoning over paragraphs!
@nlpmattg
Matt Gardner
5 years
Announcing DROP, a new reading comprehension benchmark that requires discrete reasoning over paragraphs of text. New @NAACLHLT paper by @ddua17 , @yizhongwyz , @pdasigi , @GabiStanovsky , @sameer_ , and me.
5
107
341
0
2
16
@max_nlp
Max Bartolo
2 years
What. An. Event! 🎇 Massive thanks to our speakers, panelists, authors, shared task participants and a huge heartfelt thanks to our fantastic organising team and sponsors for making it all possible! ❤️
@DADCworkshop
Dynamic Adversarial Data Collection Workshop
2 years
We had an absolute blast at our social, big up to @RaphiRaph_ for the venue....truly inspirational 😍😍 But our #NAACL 22 journey has come to an end so we'll be signing out until next year 🥹🥹 To all the #DADC fans, you've been dynamic, adversarial and awesome xox♥️
Tweet media one
Tweet media two
0
3
31
0
1
16
@max_nlp
Max Bartolo
3 years
Super exciting news: You can now create your own Dynabench tasks using Dynatask!
@AIatMeta
AI at Meta
3 years
Today, we’re unlocking @DynabenchAI , a first-of-its-kind platform for dynamic AI benchmarking. AI researchers can now create their own custom tasks to better evaluate the performance of #NLP models in more dynamic, & realistic settings for free.
3
59
213
1
5
16
@max_nlp
Max Bartolo
3 months
I'll see you and raise you the 4th on the list
Tweet media one
@RichardSocher
Richard Socher
3 months
you vs pplx substance vs hype
Tweet media one
Tweet media two
59
30
487
1
4
15
@max_nlp
Max Bartolo
2 months
Adding models is great (and it should have them all), but the real problem with this plot is that Performance != MMLU
@osanseviero
Omar Sanseviero
2 months
Everyone is adding models to the MMLU vs activated params plot, so here is a super quick one with more models. Everyone seems to forget about those not trained in the US/Europe: 01-ai Yi, InternLM, Qwen, and DeepSeek. (btw just use to compare MMLU)
Tweet media one
8
26
139
1
0
16
@max_nlp
Max Bartolo
5 months
So slick!
@aidangomez
Aidan Gomez
5 months
Cohere is making it dramatically easier to build applications using RAG. We've released code that makes connecting LLMs to your private sources of knowledge seamless. Here's how to give your model access to 100 sources like Google Drive, Slack, GitHub, Pinecone, and more. 🧵
Tweet media one
33
106
693
0
2
14
@max_nlp
Max Bartolo
4 years
Exciting PhD opportunity if you're interested in RL and NLP!
@_rockt
Tim Rocktäschel
4 years
Interested in doing a PhD at @UCL_DARK ? @egrefen and I are looking for strong&diverse applicants for UCL scholarships. Please email CV, personal statement, and research proposal to ucl-dark-phd-2020 @googlegroups .com by Dec 1. Interviews Dec 7-11. Lab site:
Tweet media one
9
38
139
0
4
14
@max_nlp
Max Bartolo
3 years
Really proud to be able to contribute to this exciting project changing the way we approach benchmarking for #NLP #NLProc . Feel free to reach out if you'd like to learn more!
@DynabenchAI
Dynabench
3 years
The Dynabench paper, accepted at #NAACL2021 , is out! The paper introduces our unified research platform for dynamic benchmarking on (so far) four initial NLU tasks. We also address some potential concerns and talk about future plans. (1/4)
2
31
62
0
3
15
@max_nlp
Max Bartolo
7 months
Coral knows.
@xvarunxx
Varun Kumethi
7 months
ChatGPT vs @cohere ’s Coral
Tweet media one
Tweet media two
3
10
54
0
2
15
@max_nlp
Max Bartolo
26 days
Chips are way too carb-dense to sustain a balanced and healthy diet 🍟
@tsarnick
Tsarathustra
28 days
Sam Altman says instead of Universal Basic Income, there should be Universal Basic Compute, where everybody gets a slice of GPT-7's compute
900
267
2K
0
0
14
@max_nlp
Max Bartolo
4 months
Great title and very interesting work!
@JulieKallini
Julie Kallini ✨
5 months
Do LLMs learn impossible languages (that humans wouldn’t be able to acquire) just as well as they learn possible human languages? We find evidence that they don’t! Check out our new paper… 💥 Mission: Impossible Language Models 💥 ArXiv: 🧵
Tweet media one
12
114
477
0
1
13
@max_nlp
Max Bartolo
8 months
🚨 Exciting new work by @tomhosking investigating what human feedback does and does not capture! 🚨
@tomhosking
Tom Hosking
8 months
🚨 New paper 🚨 I’m excited to share the findings from my internship at @cohere with @max_nlp tl;dr Human feedback under-represents the factuality of LLM output, and annotators are less likely to spot factual errors in more assertive outputs!
8
61
307
1
0
14
@max_nlp
Max Bartolo
2 years
Trying out a few examples for the @DADCworkshop shared task () and I'm blown away. The AI should not be THIS good! Think you can do better? Try it out at
Tweet media one
1
1
12
@max_nlp
Max Bartolo
2 years
We will also be announcing the Call for Participation for our Shared Task in the coming days. Stay tuned! w/ @hannahrosekirk @EntilZhaPR @katemargatina @TristanThrush @robinomial @adinamwilliams @douwekiela & others
0
4
13
@max_nlp
Max Bartolo
5 years
Misinformation is a global problem. Fascinating to hear what's being done about it around the world from such a diverse panel of global experts ranging from Argentina to Africa to India and beyond. Thanks @TTOConference #TTOCon
1
3
13
@max_nlp
Max Bartolo
28 days
Also a massive thanks to Dirk Groeneveld, @soldni and @natolambert from @allen_ai , and @BlancheMinerva from @AiEleuther for the extremely valuable discussions and feedback (and for their commitment to developing open models that make such investigations possible)!
1
0
6
@max_nlp
Max Bartolo
4 years
Check out the work being presented by @ucl_nlp at @emnlp2020 in November. Looking forward!
@ucl_nlp
UCL Natural Language Processing
4 years
We will be presenting a few papers at @emnlp2020 in November (7 in the main conference, 2 in Findings), together with some amazing collaborators! 🤖 We are looking forward to discuss our research with you 🙂 #EMNLP2020 1/N
1
11
47
0
0
12
@max_nlp
Max Bartolo
3 months
Very cool to see so much ❤️ for what we're building @cohere !
@subnetmarco
Marco Palladino
3 months
Here are Kong's AI Gateway most popular LLM providers that have been used since it's release last month, @OpenAI taking half the cake :) We are about to ship new exciting AI infrastructure capabilities to simplify building AI applications, and managing them at scale. You can…
Tweet media one
1
5
29
0
0
12
@max_nlp
Max Bartolo
2 months
What? This is crazy 👀
@carrigmat
Matthew Carrigan
2 months
Alright, strap in. Support for Command-R+ was merged into llama.cpp exactly 4 hours ago. We're going to start talking to a GPT-4 level model on local hardware without a GPU. If you have 64GB of RAM, feel free to follow along 🧵
24
122
1K
0
0
12
@max_nlp
Max Bartolo
2 years
Super excited to be working with @PSH_Lewis again and help build the future at @CohereAI !
@PSH_Lewis
Patrick Lewis
2 years
🚨Life update🚨 After 4 wonderful years, I’ve decided it’s time for me to move on from FAIR, and today is my first day at @cohereAI ! Super excited for the next chapter and to work with Cohere’s world class team, working out of the beautiful London office in Soho! (1/3)
Tweet media one
23
10
389
0
0
11
@max_nlp
Max Bartolo
2 years
Tweet media one
0
0
11
@max_nlp
Max Bartolo
2 months
RAG in 4 minutes on an iPad with Command R+ 🤩 During a Q&A session yesterday I was asked why we chose to make model weights available. This is why. Let the builders build! 🚀
@ryancarson
Ryan Carson
2 months
Ever wanted to learn how to set up RAG with an LLM? Sounds intimidating so you’ve avoided it? It’s SO easy now because of amazing dev tooling. Here’s how I did it in 4 minutes, on my iPad, while I was waiting for my train from DC back to Connecticut: 1. Signed up for a free…
Tweet media one
5
20
99
0
0
11
@max_nlp
Max Bartolo
5 years
Finally, a leaderboard for Question Generation!
@tomhosking
Tom Hosking
5 years
It's about time question generation got some love! Introducing the AQ Leaderboard to track state-of-the-art:
0
5
19
0
2
10
@max_nlp
Max Bartolo
2 years
Intriguing examples of how prompt selection affects few-shot performance. How do LLMs *really* use prompts?
@douwekiela
Douwe Kiela
2 years
We know that large language models are very sensitive to prompts in few-shot learning, but @maxbartolo pointed out to me that the ground truth labels don’t actually matter all that much! Check out this example for BLOOM, with opposing labels--is this something that’s well-known?
Tweet media one
Tweet media two
8
7
71
0
1
10
@max_nlp
Max Bartolo
2 years
DADC workshop paper submission deadline extended to April 15th! More details here:
@DADCworkshop
Dynamic Adversarial Data Collection Workshop
2 years
Following extensions by other NAACL workshops, we have also decided to extend the submission deadline for papers from April 8 to April 15 (AoE). We look forward to your submissions! Details: Paper Submission: #NAACL2022 #NLProc
0
7
13
0
3
9
@max_nlp
Max Bartolo
4 years
I'll be discussing recent work on adversarial human annotation in collaboration with industry during the @AI_UCL session at #TheAlgo2020 conference (Nov. 12th @UCL Online). Register here:
Tweet media one
@EmreKazim_
Emre Kazim
4 years
1 [Agenda thread]/ #Algo2020 conference! 12 Nov. @UCL Online -Free & Open to all #AI enthusiasts! Register: One-day, multi-stakeholder conference on AI & other Disruptive Tech. Check out the agenda in this thread & here:
Tweet media one
1
4
5
0
3
10
@max_nlp
Max Bartolo
1 month
Interested in humans, feedback, or both? Come say hi! 👋
@tomhosking
Tom Hosking
1 month
I'm looking forward to presenting this at @iclr_conf in Vienna next week! 🇦🇹 If you'd like to discuss the paper, human feedback, discrete representations for NLP or @cohere come and find me and @max_nlp in poster session #3 on Wednesday @ 0945h local time! 🎉
1
6
38
3
3
10
@max_nlp
Max Bartolo
2 months
@BlackHC @huggingface @CohereForAI @cohere Not sure if this affects anything but are these the results for CohereForAI/c4ai-command-r-plus or CohereForAI/c4ai-command-r-plus-4bit? Also, curious, are the numbers reported the strict-match or flexible-extract results from the Eleuther LM Eval Harness?
0
0
1
@max_nlp
Max Bartolo
29 days
1
0
2
@max_nlp
Max Bartolo
2 years
This is a fantastic opportunity to participate in a shared task and compete with researchers from around the world. Can YOU beat the AI?? 🤔🆚🤖 All it takes is 100 examples (track 1)! Sign up today:
@DADCworkshop
Dynamic Adversarial Data Collection Workshop
2 years
You now have until May 15th (the end of the Track 1 example creation window) to register your team for the DADC shared task (). Join the 10 awesome teams from around the world who have already signed up! We'll also have prizes thanks to @commons_ml !! 🏆🤩
0
7
6
1
6
9
@max_nlp
Max Bartolo
14 days
@aidangomez @yichern_tan have you tried edible glue? I hear that's how the Italians do it
1
0
6
@max_nlp
Max Bartolo
2 years
Great read! The benchmarking section is by far the most exciting (🤩) but the rest is pretty cool too! 🚀
@seb_ruder
Sebastian Ruder
2 years
ML and NLP Research Highlights of 2021 These are the research areas and papers I found most exciting & inspiring in 2021.
27
417
1K
0
3
9
@max_nlp
Max Bartolo
5 years
@pminervini presenting work on Neural Link Prediction (and the fourth @ucl_nlp talk of the day involving the Obamas) #NLProc
Tweet media one
1
2
9
@max_nlp
Max Bartolo
2 years
Exciting new work on using data generation to mitigate spurious correlations along with de-biased versions of popular NLI datasets being made available!
@YuxiangJWu
Yuxiang (Jimmy) Wu
2 years
🚨New ACL2022 paper!🚨“Generating Data to Mitigate Spurious Correlations in Natural Language Inference Datasets”. Read the paper here: , and check out the thread below w/ @nlpmattg , Pontus Stenetorp, @pdasigi @ai2_allennlp 🧵1/N
Tweet media one
3
21
128
0
3
9
@max_nlp
Max Bartolo
2 years
Check out this blog post for an intro to dynamic adversarial data collection. And don't forget to join us at #NAACL2022 for the @DADCworkshop on July 14th!
@MLCommons
MLCommons
2 years
DADC improves AI accuracy through higher quality and more diverse data collected with human and AI collaboration. We believe this will help the community build robust ML and why are sponsoring @DADCWorkshop @DynabenchAI at #NAACL2022
0
5
17
0
2
9
@max_nlp
Max Bartolo
2 years
" @DynabenchAI relies on crowdworkers"... But it doesn't have to! The @DADCworkshop shared task is a fantastic opportunity for the wider #NLProc community to contribute -- and it's currently underway. Can YOU beat the AI?: 🚀
0
4
9
@max_nlp
Max Bartolo
5 years
First few #MLIndex posts are finally out and can be found at . Starting off light but more to come soon..
1
0
9