Graham Neubig Profile
Graham Neubig

@gneubig

Followers
38K
Following
4K
Media
453
Statuses
4K

Associate professor @LTIatCMU. Co-founder/chief scientist @allhands_ai. I mostly work on modeling language.

Joined September 2010
Don't wanna be here? Send us removal request.
@gneubig
Graham Neubig
6 years
So apparently the cool pictures of the black hole today are from the algorithm in Bouman et al. 2016 (, a CVPR paper that has been cited a total of 11 times. Citations are not necessarily an indication of impactful work, esp. multidisciplinary work!.
11
478
2K
@gneubig
Graham Neubig
1 year
Google’s Gemini recently made waves as a major competitor to OpenAI’s GPT. Exciting! But we wondered:. How good is Gemini really?. At CMU, we performed an impartial, in-depth, and reproducible study comparing Gemini, GPT, and Mixtral. Paper: .🧵
Tweet media one
29
261
1K
@gneubig
Graham Neubig
4 years
2021 version of CMU "Neural Networks for NLP" slides ( and videos ( are being posted in real time! Check it out for a comprehensive graduate-level class on NLP! New this year: assignment on implementing parts of your own NN toolkit.
Tweet media one
2
284
1K
@gneubig
Graham Neubig
2 years
I had to travel 26 hours and spend $2000+ to join #ICLR2023 in Rwanda. But people in Africa have to do this every time a conference is held in US. What happens when we make it easier to participate?. 1530% higher registrations from Africa. This is important and must continue.
Tweet media one
17
185
1K
@gneubig
Graham Neubig
5 years
I've finished uploading the lecture videos for CMU CS11-747 "Neural Networks for NLP"'s 2020 edition: Check it out if you're interested in a comprehensive graduate-level course on modern NLP methods!.
10
272
916
@gneubig
Graham Neubig
2 years
OpenAI recently added a method to make asynchronous calls, which is good if you want many calls quickly. But it’s not super-well-documented, so I wrote a quick demo of how to make many calls at once, e.g. 100+ in a few seconds. Hope it's helpful!
Tweet media one
23
189
881
@gneubig
Graham Neubig
5 years
I have a joke about neural language models. I have a joke about neural language models. I have a joke about neural language models. I have a joke about neural language models.
11
85
868
@gneubig
Graham Neubig
5 months
How far are we from having competent AI co-workers that can perform tasks as varied as software development, project management, administration, and data science?. In our new paper, we introduce TheAgentCompany, a benchmark for AI agents on consequential real-world tasks.
Tweet media one
18
142
822
@gneubig
Graham Neubig
6 years
Finished uploading all videos for 2019 edition of CMU CS11-747 "Neural Networks for NLP": Like other offerings (e.g. Stanford CS224n) it covers basics, but it's also a grad course with more topics, so it might be a good choice if you want to go deeper!.
5
225
793
@gneubig
Graham Neubig
3 years
We've started the Fall 2022 edition of:.🎓CMU CS11-711 Advanced NLP!🎓. Follow along for.* An intro of core topics.* Timely content; prompting, retrieval, bias/fairness.* Content on NLP research methodology. Page: Videos:
Tweet media one
9
194
765
@gneubig
Graham Neubig
11 months
Announcement: @rbren_dev, @xingyaow_, and I have formed a company!. Our name is All Hands AI 🙌 And our mission is to build the world’s best AI software development agents, for everyone, in the open. Here’s why I think this mission is important 🧵
Tweet media one
32
95
707
@gneubig
Graham Neubig
2 months
I created a Python project starter repo for students that helps maintain good code quality while doing research projects: I was opinionated and made only one choice for each tool, but there are other options too!
Tweet media one
17
97
686
@gneubig
Graham Neubig
9 months
We started the Fall 2024 version of CMU CS11-711 Advanced NLP🎓 Follow along to learn about the latest in NLP, LLMs, Agents, etc. * Materials: * Videos:
5
130
664
@gneubig
Graham Neubig
6 years
2019 edition of CMU "Neural Networks for NLP" is starting tomorrow! We'll post slides/lecture videos, feel free to follow along 2019 brings new classes on contextualized word representations (ELMo/BERT) and model interpretation; PyTorch/DyNet code examples
Tweet media one
Tweet media two
Tweet media three
3
168
597
@gneubig
Graham Neubig
1 year
I made some new class slides on “a tour of modern LMs” that has some observations about characteristics of recent LLMs, mostly focusing on open LLMs where we know their details: Check it out if interested, and feedback is welcome!
Tweet media one
6
128
586
@gneubig
Graham Neubig
1 year
Researchers often have to ask for recommendation letters for visa/job applications, etc. I wrote a script that allows you to find who cites your papers frequently to create a list of potential letter writers: Hope it's helpful, improvements are welcome!.
4
95
579
@gneubig
Graham Neubig
3 years
Recently some complain about prompting as an approach to NLP. "It's so brittle." "Prompt engineering is hacky." etc. But there's another way to view it: prompt engineering is another way of tuning the model's parameters, and human interpretable! See 1/2
Tweet media one
3
99
559
@gneubig
Graham Neubig
4 months
Summary in case you missed any LLM research from the past month:. * RL on math datasets improves math ability v1.* RL on math datasets improves math ability v2.* RL on math datasets improves math ability v3.* RL on math datasets improves math ability v4.* RL on math datasets. .
14
40
553
@gneubig
Graham Neubig
1 year
I'm excited to be back in the classroom for CMU 11-711 Advanced NLP this semester! We revamped the curriculum to take into account recent advances in LLMs, and we have a new assignment "build-your-own-LLaMa". We'll be posting slides/videos going forward.
10
97
535
@gneubig
Graham Neubig
6 years
"Language models as knowledge bases?" they asked: "A cat has four kidneys", replied GPT-2.
Tweet media one
18
137
503
@gneubig
Graham Neubig
5 months
Congratulations to OpenAI on the release of o3. The results are impressive and it's important that this technology remains accessible to more than a few powerful companies. With hard work and determination I expect the open source community can catch up in 3-6 months. Let's do it.
13
36
533
@gneubig
Graham Neubig
4 years
The semester is now over, and all of the videos for Neural Networks for NLP are now online! We feature new classes/sections on probing language models, sequence-to-sequence pre-training, and bias in NLP models by the wonderful TAs. Check them out:
@gneubig
Graham Neubig
4 years
2021 version of CMU "Neural Networks for NLP" slides ( and videos ( are being posted in real time! Check it out for a comprehensive graduate-level class on NLP! New this year: assignment on implementing parts of your own NN toolkit.
Tweet media one
2
128
498
@gneubig
Graham Neubig
4 years
Powerful LMs such as GPT-3 and T5 have impressive ability to answer questions by continuing a textual prompt. However, how can we know when an LM knows the answer with confidence, and when it's making a random guess? Our new preprint asks this: 1/N
Tweet media one
4
76
484
@gneubig
Graham Neubig
5 years
At @LTIatCMU we held a week-long "Low Resource Natural Language Processing Bootcamp" with 8 sets of lectures & exercises on getting NLP to work in languages where resources are less abundant. We're making them available for all who are interested here: 1/
Tweet media one
11
155
466
@gneubig
Graham Neubig
6 months
We are now done with all classes for CMU CS11-711 Advanced NLP!. Slides: Videos: Hope this is useful to people 😀.
@gneubig
Graham Neubig
9 months
We started the Fall 2024 version of CMU CS11-711 Advanced NLP🎓 Follow along to learn about the latest in NLP, LLMs, Agents, etc. * Materials: * Videos:
6
92
483
@gneubig
Graham Neubig
3 years
I was a bit short on research ideas, so I decided to ask @chrmanning (as simulated by @huggingface 's BLOOM for some inspiration. The advice was.
Tweet media one
18
49
465
@gneubig
Graham Neubig
2 years
GPT-4 has been out for 72 hours, and it could change the world! Here are some amazing and important things it *can't* do (yet) ⬇️.
7
105
468
@gneubig
Graham Neubig
5 years
Looking forward to our *brand new class*, CMU CS11-737 "Multilingual Natural Language Processing" this semester with Yulia Tsvetkov and Alan Black! We're covering the linguistics, modeling, and data that you need to build NLP systems in new languages: 1/2
Tweet media one
Tweet media two
Tweet media three
Tweet media four
7
126
452
@gneubig
Graham Neubig
6 years
Reviewer #1: "This paper is too simple, reject.".Reviewer #2: "This paper is so simple, it's awesome! Strong accept!".(Thank you reviewer #2!).
6
41
437
@gneubig
Graham Neubig
1 year
We have started posting CMU Advanced NLP lecture videos on YouTube: Check out the first 7!.1. Overview of NLP.2. Word Representation.3. Language Modeling.4. Sequence Modeling.5. Transformers.6. Generation Algorithms (by @abertsch72).7. Prompting.
@gneubig
Graham Neubig
1 year
I'm excited to be back in the classroom for CMU 11-711 Advanced NLP this semester! We revamped the curriculum to take into account recent advances in LLMs, and we have a new assignment "build-your-own-LLaMa". We'll be posting slides/videos going forward.
6
90
443
@gneubig
Graham Neubig
3 years
Happy to announce that I've formed a company, Inspired Cognition ( together with @stefan_fee and @odashi_en!. Our goal is to make it easier and more efficient to build AI systems (particularly NLP) through our tools and expertise. 1/2
Tweet media one
14
48
442
@gneubig
Graham Neubig
2 years
There are so many chatbots nowadays, it’s hard to keep up!. To help out, we made an open source tool for automatic comparison of chatbots, and created a report on LLaMa, Alpaca, Vicuna, ChatGPT, Cohere, etc.!. Report: Browser: 🧵⬇️
Tweet media one
8
100
415
@gneubig
Graham Neubig
2 years
CMU Advanced NLP is done for 2022! Check the videos on YouTube 😃. I also rehauled our assignments to reflect important skills in NLP for 2022: If you're teaching/learning NLP see the 🧵 and doc for more!.
@gneubig
Graham Neubig
3 years
We've started the Fall 2022 edition of:.🎓CMU CS11-711 Advanced NLP!🎓. Follow along for.* An intro of core topics.* Timely content; prompting, retrieval, bias/fairness.* Content on NLP research methodology. Page: Videos:
Tweet media one
9
103
419
@gneubig
Graham Neubig
4 years
We have finished uploading our 23 class videos on Multilingual NLP: Including two really great guest lectures:.NLP for Indigenous Languages (by Pat Littell, CNRC): Universal NMT (by Orhan Firat, Google):
@gneubig
Graham Neubig
5 years
Looking forward to our *brand new class*, CMU CS11-737 "Multilingual Natural Language Processing" this semester with Yulia Tsvetkov and Alan Black! We're covering the linguistics, modeling, and data that you need to build NLP systems in new languages: 1/2
Tweet media one
Tweet media two
Tweet media three
Tweet media four
4
124
414
@gneubig
Graham Neubig
4 years
Happy to release the first video lectures for CMU 11-711 Advanced NLP (the successor to 11-747 Neural Nets for NLP) 😃. Check it out and follow along through the semester for an advanced graduate course on #nlproc!. Site: Videos:
2
79
398
@gneubig
Graham Neubig
1 year
Apparently the original transformer figure was drawn in illustrator, but I have a modifiable version in keynote here in case it's useful to anyone:
@jxmnop
jack morris
1 year
what software was this made with? i don't think you can draw arrows that curve like that w/ Google Drawings
Tweet media one
2
45
401
@gneubig
Graham Neubig
8 months
It all makes sense now.
Tweet media one
3
46
401
@gneubig
Graham Neubig
2 years
I wrote a more efficient/robust OpenAI querying wrapper:. 1. Parallel execution with adjustable rate limits.2. Automatic retries on failure.3. Interface to Huggingface/Cohere for comparison. This finished a 33k completions in ≈1 hour!. Available here:
Tweet media one
Tweet media two
@gneubig
Graham Neubig
2 years
OpenAI recently added a method to make asynchronous calls, which is good if you want many calls quickly. But it’s not super-well-documented, so I wrote a quick demo of how to make many calls at once, e.g. 100+ in a few seconds. Hope it's helpful!
Tweet media one
8
63
375
@gneubig
Graham Neubig
6 years
Happy to announce official release of compare-mt, a tool for holistic analysis of language generation systems (MT, summarization, response generation, etc.)! This is our "secret weapon" for analyzing our systems and understanding what's going right/wrong.
Tweet media one
4
117
364
@gneubig
Graham Neubig
4 years
If you're looking for some nice videos on cutting-edge NLP research, check out the @LTIatCMU YouTube Channel with presentations by LTI members and guest speakers! 我们的中国朋友也可以观看bilibili:
3
78
364
@gneubig
Graham Neubig
6 years
Excellent categorized machine translation reading list by Tsinghua University NLP group: Excellent coverage of modern papers -- it should be a good first stop if you want to learn about the state-of-the-art in a particular sub-topic of MT.
2
121
360
@gneubig
Graham Neubig
4 years
One important thing I'd like everyone that is using NLP to know is that when someone releases a wonderful new model that supports 100 languages, that doesn't mean that it works on 100 languages.
7
44
353
@gneubig
Graham Neubig
8 months
New blog: "Don't Sleep on Single-Agent Systems". Multi-agent systems are all the rage, but sometimes one agent is all you need! (and simpler, more maintainable, etc.). I also discuss design considerations for building versatile, powerful single agents.
6
63
361
@gneubig
Graham Neubig
3 years
We have released videos of CMU CS11-737 Multilingual NLP: Check them out if you're interested in learning about how to apply NLP and Speech technology to many different languages! 1/2
Tweet media one
Tweet media two
3
89
344
@gneubig
Graham Neubig
1 year
ACL has removed the anonymity period. This means that ACL submissions can be posted and discussed online at any time, although extensive PR is discouraged.
Tweet media one
5
86
344
@gneubig
Graham Neubig
8 years
Uploaded our vector representations for 1017 world languages Use them for your multi-lingual NLP tasks!.
4
139
336
@gneubig
Graham Neubig
9 months
The Information reports that OpenAI's new "strawberry" product will be in ~2 weeks, using 10-20 seconds of inference time compute: If you want to study up on methods for inference time compute, our survey could be useful!
6
49
331
@gneubig
Graham Neubig
2 years
Here are the slides for my kick-off talk, a high level overview of the exciting promise and current issues with large language models:
Tweet media one
@gneubig
Graham Neubig
2 years
Exciting energy for the @LTIatCMU large language model event! Come on out this weekend if you're around Pittsburgh and interested in LLMs
Tweet media one
3
71
323
@gneubig
Graham Neubig
1 year
Updates for OpenDevin ( this week:. - CodeAct 1.3 agent with browsing and github support.- With GPT-4o, 25% accuracy on SWE-Bench Lite, 4% over the SOTA we set last week!.- A new evals visualizer.- Plans to add more agents/evals, we'd love your help!. 🧵
Tweet media one
11
58
324
@gneubig
Graham Neubig
5 years
Happy to release NN4NLP-concepts! It's a typology of important concepts that you should know to implement SOTA NLP models using neural nets: 1/3. We'll reference this in CMU CS11-747 this year, trying to maximize coverage. 1/3
Tweet media one
@gneubig
Graham Neubig
5 years
2020 edition of CMU CS11-747 "Neural Networks for NLP", is starting tomorrow! We (co-teacher @stefan_fee and 6 wonderful TAs) restructured it a bit to be more focused on "core concepts" used across a wide variety of applications. 1/2
Tweet media one
2
108
315
@gneubig
Graham Neubig
1 year
The OpenDevin open-source coding assistant is really taking shape! We now have a frontend that connects to a rudimentary agent that solves coding tasks, a docker sandbox, and other things. Next up is optimizing accuracy, we welcome contributions!
3
58
317
@gneubig
Graham Neubig
1 year
The videos for the spring semester of CMU 11-711 Advanced NLP are now all available 📺. Thanks to the TAs, students in the class, and everyone who followed along. We're doing it again in the Fall!.
@gneubig
Graham Neubig
1 year
I'm excited to be back in the classroom for CMU 11-711 Advanced NLP this semester! We revamped the curriculum to take into account recent advances in LLMs, and we have a new assignment "build-your-own-LLaMa". We'll be posting slides/videos going forward.
4
79
313
@gneubig
Graham Neubig
1 year
Honest question: what do people mean when they say a model is "aligned". Is it different semantically from "fine-tuned" and if so how?.
69
24
314
@gneubig
Graham Neubig
1 year
We're excited about all the interest in our Gemini report and working to make it even better!. This week we made major improvements, switching to the @MistralAI instruct model, and working with the Gemini team to reproduce their results. Updates below.
@gneubig
Graham Neubig
1 year
Google’s Gemini recently made waves as a major competitor to OpenAI’s GPT. Exciting! But we wondered:. How good is Gemini really?. At CMU, we performed an impartial, in-depth, and reproducible study comparing Gemini, GPT, and Mixtral. Paper: .🧵
Tweet media one
8
39
304
@gneubig
Graham Neubig
1 year
We've reached a small but exciting milestone for OpenDevin, the open source AI software engineer -- OpenDevin sends a pull request to the OpenDevin repo. You can see the PR here:
5
53
308
@gneubig
Graham Neubig
5 years
I've started to upload the videos for the Neural Nets for NLP class here: We'll be uploading the videos regularly throughout the rest of the semester, so please follow the playlist if you're interested.
@gneubig
Graham Neubig
5 years
2020 edition of CMU CS11-747 "Neural Networks for NLP", is starting tomorrow! We (co-teacher @stefan_fee and 6 wonderful TAs) restructured it a bit to be more focused on "core concepts" used across a wide variety of applications. 1/2
Tweet media one
3
69
292
@gneubig
Graham Neubig
5 years
2020 edition of CMU CS11-747 "Neural Networks for NLP", is starting tomorrow! We (co-teacher @stefan_fee and 6 wonderful TAs) restructured it a bit to be more focused on "core concepts" used across a wide variety of applications. 1/2
Tweet media one
9
72
294
@gneubig
Graham Neubig
3 years
I've seen quite a few #NAACL2022 papers that say "our code is available at [link]" but the code is not available at "[link]". Everyone, let's release our research code! It's better for everyone, and hey, messy code is better than no code.
8
28
293
@gneubig
Graham Neubig
6 years
Posted CMU "Neural Nets for NLP" lecture on sentence/contextualized word representations -- SkipThought, ELMo, BERT etc.: Attempted to make a systematic comparison along the model, training objective, and data dimensions, to bring method to the madness :).
0
86
283
@gneubig
Graham Neubig
8 years
Yesterday "attention is all you need" Today "you need a bunch of other stuff" Same authors😀.
3
99
280
@gneubig
Graham Neubig
7 months
One major weakness of open-source multimodal models was document and UI understanding. Not anymore! We trained a model on 7.3M web examples for grounding, OCR, and action outcome prediction, with great results. It's MultiUI, code/data/model are all open:
@xiangyue96
Xiang Yue
7 months
Working on multimodal instruction tuning and finding it hard to scale? Building Web/GUI agents but data is too narrow? .Introducing 🚀MultiUI: 7.3M multimodal instructions from 1M webpage UIs, offering diverse data to boost text-rich visual understanding. Key takeaways:
5
40
292
@gneubig
Graham Neubig
4 years
There has been much interest in ML methods that generate source code (e.g. Python) from English commands. But does this actually help software developers? We asked 31 developers to use a code generation plugin, and found some interesting results: 1/7
Tweet media one
Tweet media two
3
67
286
@gneubig
Graham Neubig
7 years
Introducing "Stack-pointer Networks", a top-down architecture for transition-based dependency parsing where each head "points" to its children using attention (#ACL2018) State-of-the-art on 21/29 tested datasets with code available:
Tweet media one
Tweet media two
Tweet media three
Tweet media four
2
85
283
@gneubig
Graham Neubig
1 year
Updates from the OpenDevin ( open AI software engineer this week.* A BrowserAgent that can effectively browse web sites.* Lots of work on building a comprehensive evaluation platform for agentic LLMs.* Lots of backend improvements to session management.🧵.
2
41
288
@gneubig
Graham Neubig
1 year
I'm in Austria for #ICLR2024! Please stop by poster 287 on WebArena ( this morning and chat about benchmarking or building web agents!
Tweet media one
9
18
282
@gneubig
Graham Neubig
9 months
Excited to announce @allhands_ai has fundraised $5M to accelerate development of open-source AI agents for developers! I'm looking forward to further building out the software, the community, and making AI developers accessible for all 🚀.
@allhands_ai
All Hands AI
9 months
We are proud to announce that All Hands has raised $5M to build the world’s best software development agents, and do it in the open 🙌. Thank you to @MenloVentures and our wonderful slate of investors for believing in the mission!.
19
33
280
@gneubig
Graham Neubig
4 years
We've been on a multi-year effort to take steps towards understanding how well NLP/language tech serves people on a *global* scale. Here's a first report: We perform meta-analysis of performance across 7 tasks, and devise "global utility" metrics. 1/7
Tweet media one
Tweet media two
1
47
267
@gneubig
Graham Neubig
7 years
Just released "CoNaLa", a dataset/contest for broad-coverage generation of programs from English commands: 2,879 manually annotated examples, and 600k mined from StackOverflow to increase coverage; super-excited to bring NL->Code to the open domain!
Tweet media one
Tweet media two
1
109
266
@gneubig
Graham Neubig
1 year
Recently there were some great results from the new Mamba architecture ( by @_albertgu and @tri_dao. We did a bit of third-party validation, and.1. The results are reproducible.2. Mamba 2.8B is competitive w/ some 7B models (!).3. Mistral is still strong.
@a13xba
Alex
1 year
Since some of you might be wondering whether Mamba 2.8B can serve as a drop-in replacement of some of the larger models, we've compared the Mamba model family to some of the most popular 7B models in @try_zeno . Report: 🧵 1/5.
3
31
266
@gneubig
Graham Neubig
3 years
If you are applying to 🎓🤖grad programs in AI🤖🎓, here are three great resources:.1. Student perspectives on applications: 2. Example SoPs from recent applicants: 3. The CMU application mentorship program:
0
65
267
@gneubig
Graham Neubig
4 years
Super-excited for the official release of ExplainaBoard, a new concept in leaderboards for NLP: It covers *9* tasks with *7* functionalities to analyze, explore, and combine results. Please try it out, submit systems, and help improve evaluation for NLP!
@stefan_fee
Pengfei Liu
4 years
What's your system good/bad at? .Where can your model outperform others? .What are the mistakes that the top-10 systems make? .We are always struggling with these questions. A new academic tool can help us answer them in a one-click fashion and many more:
3
74
261
@gneubig
Graham Neubig
3 years
Retrieval-based models are increasingly important in NLP/QA. But an important factor in modeling text is knowing *where* it came from. Our #ICLR2022 paper proposes retrieval-based LMs considers the "structural locality" of texts to improve retrieval: 🧵↓
Tweet media one
3
39
258
@gneubig
Graham Neubig
2 years
Lecture slides for my talk at UIUC and UPenn on "Is my NLP model working? The answer is harder than you think.". I talk about state-of-the-art evaluation metrics for text generation, why they're important, and how you can use them to improve systems:
Tweet media one
Tweet media two
Tweet media three
Tweet media four
4
52
261
@gneubig
Graham Neubig
4 years
"Paraphrastic representations at scale" is a strong, blazing fast package for sentence embeddings by @johnwieting2. Paper: Code: Beats Sentence-BERT, LASER, USE on STS tasks, works multilingually, and is up to 6,000 times faster 😯
Tweet media one
Tweet media two
Tweet media three
3
45
259
@gneubig
Graham Neubig
3 years
CMU 11-711 Advanced NLP has drawn to a close! You can now access all class materials online:.Slides: Videos: Hope it's useful, and stay tuned for "11-737 Multilingual NLP" next semester!.
@gneubig
Graham Neubig
4 years
In Fall 2021, CMU is updating its NLP curriculum, and 11-747 "Neural Networks for NLP" is being repurposed into 11-711 "Advanced NLP", the flagship research-based NLP class 😃.More NLP fundamentals, still neural network methods. Stay tuned! (CMU students, please register!).
3
55
255
@gneubig
Graham Neubig
6 years
Cross-lingual transfer is a powerful tool for low-resource NLP. But when you build a system for a new language (say Bengali), what language do you transfer from? Our #ACL2019 paper "Choosing Transfer Languages for Cross-lingual Learning" asks this: 1/7
Tweet media one
3
64
248
@gneubig
Graham Neubig
8 years
Nice! Our paper on differentiable beam search (@kartik_goyal_, me, @redpony, and Taylor BK) was accepted to AAAI! Read to learn how to backprop through your search algorithm:
Tweet media one
1
57
254
@gneubig
Graham Neubig
5 years
Next year I will be looking for 1-2 PhD students who are interested in doing deep and impactful work on NLP! (areas are open, but I like multilingual NLP/compling, natural language interfaces, ML for NLP). Please apply below and mention me in your app: 1/2.
5
86
246
@gneubig
Graham Neubig
5 years
Really happy our paper on Differentiable Data Selection will appear at #ICML2020! The method is a *principled* way to choose which data goes into models and it's super-broadly applicable. We've already used it in multilingual models at #acl2020nlp too
@cindyxinyiwang
Xinyi Wang (Cindy)
5 years
Not all training data are equal, but how to identify the good data efficiently at different stage of model training? We propose to train a data selection agent by up-weighting data that has similar gradient with the gradient of the dev set:
Tweet media one
1
43
247
@gneubig
Graham Neubig
6 years
#ICLR2019 paper "Lagging Inference Networks and Posterior Collapse in VAEs". VAEs collapse to trivial solutions; we find this is because the inference network is poor at the beginning of training, then propose a simple solution of "aggressive update":
Tweet media one
Tweet media two
Tweet media three
Tweet media four
2
58
251
@gneubig
Graham Neubig
1 year
I'm looking for cost effective and simple ways to serve LLMs that we trained or fine tuned ourselves (7-70B range). What are the best options nowadays? (Self promotion welcome!).
37
23
253
@gneubig
Graham Neubig
7 years
Posted "Neural Lattice Language Models", our new paper (TACL) on LMs that calculate probability of a sentence by marginalizing over a lattice! It's a nice and flexible framework for LMs that lets you consider ambiguity such as word sense, segmentation, etc
Tweet media one
Tweet media two
Tweet media three
Tweet media four
0
84
244
@gneubig
Graham Neubig
4 months
Are you interested in getting started in research related to LLMs, agents, speech, safety, fairness, or other aspects of language technology?. At @LTIatCMU we're hosting an internship program for pre-doctoral students interested in these areas!
8
47
252
@gneubig
Graham Neubig
5 years
We have started uploading the lecture videos for CS11-737 to YouTube now! You can see the first two on the class intro, and typology.
@gneubig
Graham Neubig
5 years
Looking forward to our *brand new class*, CMU CS11-737 "Multilingual Natural Language Processing" this semester with Yulia Tsvetkov and Alan Black! We're covering the linguistics, modeling, and data that you need to build NLP systems in new languages: 1/2
Tweet media one
Tweet media two
Tweet media three
Tweet media four
6
71
244
@gneubig
Graham Neubig
8 months
One of my favorite things about using coding agents is that it's possible to do the things you wanted to do but never had time for. For me this is (1) making frontends, (2) doing little data visualizations. Here's a blog about frontend dev with agents:
4
32
245
@gneubig
Graham Neubig
1 year
With long-context LMs, we can now fit *thousands* of training examples in context!. We perform an in-depth exploration of many-shot in-context learning, finding it surprisingly effective, providing huge increases over few-shot prompting, and competitive with fine-tuning!.
@abertsch72
Amanda Bertsch @NAACL
1 year
In-context learning provides an LLM with a few examples to improve accuracy. But with long-context LLMs, we can now use *thousands* of examples in-context. We find that this long-context ICL paradigm is surprisingly effective– and differs in behavior from short-context ICL! 🧵
Tweet media one
4
31
242
@gneubig
Graham Neubig
5 years
New #acl2020nlp paper on "Generalizing Natural Language Analysis through Span-relation Representations"! We show how to solve 10 very different natural language analysis tasks with a single general-purpose method -- span/relation representations! 1/
Tweet media one
2
51
233
@gneubig
Graham Neubig
1 year
An interesting tidbit from the Mamba paper (: the Transformer vs. Transformer++ comparison. Transformer is the original version, and Transformer++ is the LLaMa-2 version (SwiGLU/RoPE/training tweaks). Architectures/algorithms make a huge difference!
Tweet media one
4
35
236
@gneubig
Graham Neubig
8 months
I just did a data munging task for a research project in about 10 minutes with AI agents using OpenHands. Three months ago basically the same task took 2 PhD students and me several hours. AI is pretty clearly going to be revolutionary for science once everyone starts using it.
8
12
236
@gneubig
Graham Neubig
2 years
If you want to study NLP, LLMs, or broader language technology in grad school, please apply to @LTIatCMU! We have a great group of faculty covering many topics: I personally will be recruiting students on LLMs/agents/evaluation.
0
58
231
@gneubig
Graham Neubig
3 years
MEGA is a new method for modeling long sequences based on the surprisingly simple technique of taking the moving average of embeddings. Excellent results, outperforming strong competitors such as S4 on most tasks! Strongly recommend that you check it out:
Tweet media one
Tweet media two
@violet_zct
Chunting Zhou
3 years
I'm excited to share our work on a new sequence modeling architecture called Mega: Moving Average Equipped Gated Attention. Mega achieves SOTA results on multiple benchmarks, including NMT, Long Range Arena, language modeling, ImageNet and raw speech classification.
Tweet media one
2
32
233
@gneubig
Graham Neubig
5 years
So I was browsing the results for the new Google chatbot Meena, and they look pretty OK (if boring sometimes). However, every once in a while it enters "scary sociopath mode," which is, shall we say, sub-optimal 😨
Tweet media one
Tweet media two
8
30
224
@gneubig
Graham Neubig
5 years
I was invited to give a talk to the New York Circle of Translators on machine translation and its implications for the practice of translation. This gave me a good opportunity to reflect on what MT currently can and can't do. See slides/comments here!
Tweet media one
Tweet media two
Tweet media three
Tweet media four
8
46
216
@gneubig
Graham Neubig
4 years
In Fall 2021, CMU is updating its NLP curriculum, and 11-747 "Neural Networks for NLP" is being repurposed into 11-711 "Advanced NLP", the flagship research-based NLP class 😃.More NLP fundamentals, still neural network methods. Stay tuned! (CMU students, please register!).
@gneubig
Graham Neubig
4 years
2021 version of CMU "Neural Networks for NLP" slides ( and videos ( are being posted in real time! Check it out for a comprehensive graduate-level class on NLP! New this year: assignment on implementing parts of your own NN toolkit.
Tweet media one
1
19
222
@gneubig
Graham Neubig
5 years
This is a well-written overview! Also, for a higher-level, more philosophical take on search in generation models see my recent class (slides: video: . I discuss the relationship between model, search, and output quality.
Tweet media one
Tweet media two
Tweet media three
@huggingface
Hugging Face
5 years
The 101 for text generation! 💪💪💪. This is an overview of the main decoding methods and how to use them super easily in Transformers with GPT2, XLNet, Bart, T5,. It includes greedy decoding, beam search, top-k/nucleus sampling,. : by @PatrickPlaten
Tweet media one
0
48
223
@gneubig
Graham Neubig
6 years
#acl2019nlp paper on "Beyond BLEU: Training NMT with Semantic Similarity" by Wieting et al.: I like this because it shows 1) a nice use case for semantic similarity, 2) that we can/should optimize seq2seq models for something other than likelihood or BLEU!
Tweet media one
Tweet media two
Tweet media three
Tweet media four
5
58
220
@gneubig
Graham Neubig
1 year
Thanks to Devin for the contribution to OpenDevin!. It's great to see that even AI programmers believe in the power of open source 😃.
7
17
195
@gneubig
Graham Neubig
1 year
TL;DR on the results?. On all tasks, Gemini Pro achieved comparable but slightly lower accuracy than the current version of OpenAI's GPT 3.5 Turbo. Gemini and GPT were somewhat better than open-source contender Mixtral. But there’s quite a bit of nuance, let’s dig deeper…
Tweet media one
2
32
162
@gneubig
Graham Neubig
4 years
Just released a new survey on prompting methods, which use language models to solve prediction tasks by providing them with a "prompt" like: "CMU is located in __". We worked really hard to make this well-organized and educational for both NLP experts and beginners, check it out!.
@stefan_fee
Pengfei Liu
4 years
What is prompt-based learning, and what challenges are there? Will it be a new paradigm or a way for human-PLMs communication? How does it connect with other research and how to position it in the evolution of the NLP research paradigm? We released a systematic survey and beyond
Tweet media one
Tweet media two
Tweet media three
1
55
217
@gneubig
Graham Neubig
5 years
Excited to give a (virtual, recorded) talk about "The Low-resource NLP Toolbox, 2020 Version" at the AfricaNLP workshop at #ICLR2020! Slides: It's somewhat of a birds-eye view, but also focusing heavily on our work at @LTIatCMU.
Tweet media one
2
44
215