Graham Neubig @gneubig profile

Graham Neubig

@gneubig

Followers

38K

Following

4K

Media

453

Statuses

4K

Associate professor @LTIatCMU. Co-founder/chief scientist @allhands_ai. I mostly work on modeling language.

Joined September 2010

Don't wanna be here? Send us removal request.

Graham Neubig

@gneubig

6 years

So apparently the cool pictures of the black hole today are from the algorithm in Bouman et al. 2016 (, a CVPR paper that has been cited a total of 11 times. Citations are not necessarily an indication of impactful work, esp. multidisciplinary work!.

11

478

2K

Graham Neubig

@gneubig

1 year

Google’s Gemini recently made waves as a major competitor to OpenAI’s GPT. Exciting! But we wondered:. How good is Gemini really?. At CMU, we performed an impartial, in-depth, and reproducible study comparing Gemini, GPT, and Mixtral. Paper: .🧵

29

261

1K

Graham Neubig

@gneubig

4 years

2021 version of CMU "Neural Networks for NLP" slides ( and videos ( are being posted in real time! Check it out for a comprehensive graduate-level class on NLP! New this year: assignment on implementing parts of your own NN toolkit.

2

284

1K

Graham Neubig

@gneubig

2 years

I had to travel 26 hours and spend $2000+ to join #ICLR2023 in Rwanda. But people in Africa have to do this every time a conference is held in US. What happens when we make it easier to participate?. 1530% higher registrations from Africa. This is important and must continue.

17

185

1K

Graham Neubig

@gneubig

5 years

I've finished uploading the lecture videos for CMU CS11-747 "Neural Networks for NLP"'s 2020 edition: Check it out if you're interested in a comprehensive graduate-level course on modern NLP methods!.

10

272

916

Graham Neubig

@gneubig

2 years

OpenAI recently added a method to make asynchronous calls, which is good if you want many calls quickly. But it’s not super-well-documented, so I wrote a quick demo of how to make many calls at once, e.g. 100+ in a few seconds. Hope it's helpful!

23

189

881

Graham Neubig

@gneubig

5 years

I have a joke about neural language models. I have a joke about neural language models. I have a joke about neural language models. I have a joke about neural language models.

11

85

868

Graham Neubig

@gneubig

5 months

How far are we from having competent AI co-workers that can perform tasks as varied as software development, project management, administration, and data science?. In our new paper, we introduce TheAgentCompany, a benchmark for AI agents on consequential real-world tasks.

18

142

822

Graham Neubig

@gneubig

6 years

Finished uploading all videos for 2019 edition of CMU CS11-747 "Neural Networks for NLP": Like other offerings (e.g. Stanford CS224n) it covers basics, but it's also a grad course with more topics, so it might be a good choice if you want to go deeper!.

5

225

793

Graham Neubig

@gneubig

3 years

We've started the Fall 2022 edition of:.🎓CMU CS11-711 Advanced NLP!🎓. Follow along for.* An intro of core topics.* Timely content; prompting, retrieval, bias/fairness.* Content on NLP research methodology. Page: Videos:

9

194

765

Graham Neubig

@gneubig

11 months

Announcement: @rbren_dev, @xingyaow_, and I have formed a company!. Our name is All Hands AI 🙌 And our mission is to build the world’s best AI software development agents, for everyone, in the open. Here’s why I think this mission is important 🧵

32

95

707

Graham Neubig

@gneubig

2 months

I created a Python project starter repo for students that helps maintain good code quality while doing research projects: I was opinionated and made only one choice for each tool, but there are other options too!

17

97

686

Graham Neubig

@gneubig

9 months

We started the Fall 2024 version of CMU CS11-711 Advanced NLP🎓 Follow along to learn about the latest in NLP, LLMs, Agents, etc. * Materials: * Videos:

5

130

664

Graham Neubig

@gneubig

6 years

2019 edition of CMU "Neural Networks for NLP" is starting tomorrow! We'll post slides/lecture videos, feel free to follow along 2019 brings new classes on contextualized word representations (ELMo/BERT) and model interpretation; PyTorch/DyNet code examples

3

168

597

Graham Neubig

@gneubig

1 year

I made some new class slides on “a tour of modern LMs” that has some observations about characteristics of recent LLMs, mostly focusing on open LLMs where we know their details: Check it out if interested, and feedback is welcome!

6

128

586

Graham Neubig

@gneubig

1 year

Researchers often have to ask for recommendation letters for visa/job applications, etc. I wrote a script that allows you to find who cites your papers frequently to create a list of potential letter writers: Hope it's helpful, improvements are welcome!.

4

95

579

Graham Neubig

@gneubig

3 years

Recently some complain about prompting as an approach to NLP. "It's so brittle." "Prompt engineering is hacky." etc. But there's another way to view it: prompt engineering is another way of tuning the model's parameters, and human interpretable! See 1/2

3

99

559

Graham Neubig

@gneubig

4 months

Summary in case you missed any LLM research from the past month:. * RL on math datasets improves math ability v1.* RL on math datasets improves math ability v2.* RL on math datasets improves math ability v3.* RL on math datasets improves math ability v4.* RL on math datasets. .

14

40

553

Graham Neubig

@gneubig

1 year

I'm excited to be back in the classroom for CMU 11-711 Advanced NLP this semester! We revamped the curriculum to take into account recent advances in LLMs, and we have a new assignment "build-your-own-LLaMa". We'll be posting slides/videos going forward.

10

97

535

Graham Neubig

@gneubig

6 years

"Language models as knowledge bases?" they asked: "A cat has four kidneys", replied GPT-2.

18

137

503

Graham Neubig

@gneubig

5 months

Congratulations to OpenAI on the release of o3. The results are impressive and it's important that this technology remains accessible to more than a few powerful companies. With hard work and determination I expect the open source community can catch up in 3-6 months. Let's do it.

13

36

533

Graham Neubig

@gneubig

4 years

The semester is now over, and all of the videos for Neural Networks for NLP are now online! We feature new classes/sections on probing language models, sequence-to-sequence pre-training, and bias in NLP models by the wonderful TAs. Check them out:

Graham Neubig

@gneubig

4 years

2021 version of CMU "Neural Networks for NLP" slides ( and videos ( are being posted in real time! Check it out for a comprehensive graduate-level class on NLP! New this year: assignment on implementing parts of your own NN toolkit.

2

128

498

Graham Neubig

@gneubig

4 years

Powerful LMs such as GPT-3 and T5 have impressive ability to answer questions by continuing a textual prompt. However, how can we know when an LM knows the answer with confidence, and when it's making a random guess? Our new preprint asks this: 1/N

4

76

484

Graham Neubig

@gneubig

5 years

At @LTIatCMU we held a week-long "Low Resource Natural Language Processing Bootcamp" with 8 sets of lectures & exercises on getting NLP to work in languages where resources are less abundant. We're making them available for all who are interested here: 1/

11

155

466

Graham Neubig

@gneubig

6 months

We are now done with all classes for CMU CS11-711 Advanced NLP!. Slides: Videos: Hope this is useful to people 😀.

Graham Neubig

@gneubig

9 months

We started the Fall 2024 version of CMU CS11-711 Advanced NLP🎓 Follow along to learn about the latest in NLP, LLMs, Agents, etc. * Materials: * Videos:

6

92

483

Graham Neubig

@gneubig

3 years

I was a bit short on research ideas, so I decided to ask @chrmanning (as simulated by @huggingface 's BLOOM for some inspiration. The advice was.

18

49

465

Graham Neubig

@gneubig

2 years

GPT-4 has been out for 72 hours, and it could change the world! Here are some amazing and important things it *can't* do (yet) ⬇️.

7

105

468

Graham Neubig

@gneubig

5 years

Looking forward to our *brand new class*, CMU CS11-737 "Multilingual Natural Language Processing" this semester with Yulia Tsvetkov and Alan Black! We're covering the linguistics, modeling, and data that you need to build NLP systems in new languages: 1/2

7

126

452

Graham Neubig

@gneubig

6 years

Reviewer #1: "This paper is too simple, reject.".Reviewer #2: "This paper is so simple, it's awesome! Strong accept!".(Thank you reviewer #2!).

6

41

437

Graham Neubig

@gneubig

1 year

We have started posting CMU Advanced NLP lecture videos on YouTube: Check out the first 7!.1. Overview of NLP.2. Word Representation.3. Language Modeling.4. Sequence Modeling.5. Transformers.6. Generation Algorithms (by @abertsch72).7. Prompting.

Graham Neubig

@gneubig

1 year

I'm excited to be back in the classroom for CMU 11-711 Advanced NLP this semester! We revamped the curriculum to take into account recent advances in LLMs, and we have a new assignment "build-your-own-LLaMa". We'll be posting slides/videos going forward.

6

90

443

Graham Neubig

@gneubig

3 years

Happy to announce that I've formed a company, Inspired Cognition ( together with @stefan_fee and @odashi_en!. Our goal is to make it easier and more efficient to build AI systems (particularly NLP) through our tools and expertise. 1/2

14

48

442

Graham Neubig

@gneubig

2 years

There are so many chatbots nowadays, it’s hard to keep up!. To help out, we made an open source tool for automatic comparison of chatbots, and created a report on LLaMa, Alpaca, Vicuna, ChatGPT, Cohere, etc.!. Report: Browser: 🧵⬇️

8

100

415

Graham Neubig

@gneubig

2 years

CMU Advanced NLP is done for 2022! Check the videos on YouTube 😃. I also rehauled our assignments to reflect important skills in NLP for 2022: If you're teaching/learning NLP see the 🧵 and doc for more!.

Graham Neubig

@gneubig

3 years

We've started the Fall 2022 edition of:.🎓CMU CS11-711 Advanced NLP!🎓. Follow along for.* An intro of core topics.* Timely content; prompting, retrieval, bias/fairness.* Content on NLP research methodology. Page: Videos:

9

103

419

Graham Neubig

@gneubig

4 years

We have finished uploading our 23 class videos on Multilingual NLP: Including two really great guest lectures:.NLP for Indigenous Languages (by Pat Littell, CNRC): Universal NMT (by Orhan Firat, Google):

Graham Neubig

@gneubig

5 years

Looking forward to our *brand new class*, CMU CS11-737 "Multilingual Natural Language Processing" this semester with Yulia Tsvetkov and Alan Black! We're covering the linguistics, modeling, and data that you need to build NLP systems in new languages: 1/2

4

124

414

Graham Neubig

@gneubig

4 years

Happy to release the first video lectures for CMU 11-711 Advanced NLP (the successor to 11-747 Neural Nets for NLP) 😃. Check it out and follow along through the semester for an advanced graduate course on #nlproc!. Site: Videos:

2

79

398

Graham Neubig

@gneubig

1 year

Apparently the original transformer figure was drawn in illustrator, but I have a modifiable version in keynote here in case it's useful to anyone:

jack morris

@jxmnop

1 year

what software was this made with? i don't think you can draw arrows that curve like that w/ Google Drawings

2

45

401

Graham Neubig

@gneubig

8 months

It all makes sense now.

3

46

401

Graham Neubig

@gneubig

2 years

I wrote a more efficient/robust OpenAI querying wrapper:. 1. Parallel execution with adjustable rate limits.2. Automatic retries on failure.3. Interface to Huggingface/Cohere for comparison. This finished a 33k completions in ≈1 hour!. Available here:

Graham Neubig

@gneubig

2 years

OpenAI recently added a method to make asynchronous calls, which is good if you want many calls quickly. But it’s not super-well-documented, so I wrote a quick demo of how to make many calls at once, e.g. 100+ in a few seconds. Hope it's helpful!

8

63

375

Graham Neubig

@gneubig

6 years

Happy to announce official release of compare-mt, a tool for holistic analysis of language generation systems (MT, summarization, response generation, etc.)! This is our "secret weapon" for analyzing our systems and understanding what's going right/wrong.

4

117

364

Graham Neubig

@gneubig

4 years

If you're looking for some nice videos on cutting-edge NLP research, check out the @LTIatCMU YouTube Channel with presentations by LTI members and guest speakers! 我们的中国朋友也可以观看bilibili：

3

78

364

Graham Neubig

@gneubig

6 years

Excellent categorized machine translation reading list by Tsinghua University NLP group: Excellent coverage of modern papers -- it should be a good first stop if you want to learn about the state-of-the-art in a particular sub-topic of MT.

2

121

360

Graham Neubig

@gneubig

4 years

One important thing I'd like everyone that is using NLP to know is that when someone releases a wonderful new model that supports 100 languages, that doesn't mean that it works on 100 languages.

7

44

353

Graham Neubig

@gneubig

8 months

New blog: "Don't Sleep on Single-Agent Systems". Multi-agent systems are all the rage, but sometimes one agent is all you need! (and simpler, more maintainable, etc.). I also discuss design considerations for building versatile, powerful single agents.

6

63

361

Graham Neubig

@gneubig

3 years

We have released videos of CMU CS11-737 Multilingual NLP: Check them out if you're interested in learning about how to apply NLP and Speech technology to many different languages! 1/2

3

89

344

Graham Neubig

@gneubig

1 year

ACL has removed the anonymity period. This means that ACL submissions can be posted and discussed online at any time, although extensive PR is discouraged.

5

86

344

Graham Neubig

@gneubig

8 years

Uploaded our vector representations for 1017 world languages Use them for your multi-lingual NLP tasks!.

4

139

336

Graham Neubig

@gneubig

9 months

The Information reports that OpenAI's new "strawberry" product will be in ~2 weeks, using 10-20 seconds of inference time compute: If you want to study up on methods for inference time compute, our survey could be useful!

6

49

331

Graham Neubig

@gneubig

2 years

Here are the slides for my kick-off talk, a high level overview of the exciting promise and current issues with large language models:

Graham Neubig

@gneubig

2 years

Exciting energy for the @LTIatCMU large language model event! Come on out this weekend if you're around Pittsburgh and interested in LLMs

3

71

323

Graham Neubig

@gneubig

1 year

Updates for OpenDevin ( this week:. - CodeAct 1.3 agent with browsing and github support.- With GPT-4o, 25% accuracy on SWE-Bench Lite, 4% over the SOTA we set last week!.- A new evals visualizer.- Plans to add more agents/evals, we'd love your help!. 🧵

11

58

324

Graham Neubig

@gneubig

5 years

Happy to release NN4NLP-concepts! It's a typology of important concepts that you should know to implement SOTA NLP models using neural nets: 1/3. We'll reference this in CMU CS11-747 this year, trying to maximize coverage. 1/3

Graham Neubig

@gneubig

5 years

2020 edition of CMU CS11-747 "Neural Networks for NLP", is starting tomorrow! We (co-teacher @stefan_fee and 6 wonderful TAs) restructured it a bit to be more focused on "core concepts" used across a wide variety of applications. 1/2

2

108

315

Graham Neubig

@gneubig

1 year

The OpenDevin open-source coding assistant is really taking shape! We now have a frontend that connects to a rudimentary agent that solves coding tasks, a docker sandbox, and other things. Next up is optimizing accuracy, we welcome contributions!

3

58

317

Graham Neubig

@gneubig

1 year

The videos for the spring semester of CMU 11-711 Advanced NLP are now all available 📺. Thanks to the TAs, students in the class, and everyone who followed along. We're doing it again in the Fall!.

Graham Neubig

@gneubig

1 year

I'm excited to be back in the classroom for CMU 11-711 Advanced NLP this semester! We revamped the curriculum to take into account recent advances in LLMs, and we have a new assignment "build-your-own-LLaMa". We'll be posting slides/videos going forward.

4

79

313

Graham Neubig

@gneubig

1 year

Honest question: what do people mean when they say a model is "aligned". Is it different semantically from "fine-tuned" and if so how?.

69

24

314

Graham Neubig

@gneubig

1 year

We're excited about all the interest in our Gemini report and working to make it even better!. This week we made major improvements, switching to the @MistralAI instruct model, and working with the Gemini team to reproduce their results. Updates below.

Graham Neubig

@gneubig

1 year

Google’s Gemini recently made waves as a major competitor to OpenAI’s GPT. Exciting! But we wondered:. How good is Gemini really?. At CMU, we performed an impartial, in-depth, and reproducible study comparing Gemini, GPT, and Mixtral. Paper: .🧵

8

39

304

Graham Neubig

@gneubig

1 year

We've reached a small but exciting milestone for OpenDevin, the open source AI software engineer -- OpenDevin sends a pull request to the OpenDevin repo. You can see the PR here:

5

53

308

Graham Neubig

@gneubig

5 years

I've started to upload the videos for the Neural Nets for NLP class here: We'll be uploading the videos regularly throughout the rest of the semester, so please follow the playlist if you're interested.

Graham Neubig

@gneubig

5 years

2020 edition of CMU CS11-747 "Neural Networks for NLP", is starting tomorrow! We (co-teacher @stefan_fee and 6 wonderful TAs) restructured it a bit to be more focused on "core concepts" used across a wide variety of applications. 1/2

3

69

292

Graham Neubig

@gneubig

5 years

2020 edition of CMU CS11-747 "Neural Networks for NLP", is starting tomorrow! We (co-teacher @stefan_fee and 6 wonderful TAs) restructured it a bit to be more focused on "core concepts" used across a wide variety of applications. 1/2

9

72

294

Graham Neubig

@gneubig

3 years

I've seen quite a few #NAACL2022 papers that say "our code is available at [link]" but the code is not available at "[link]". Everyone, let's release our research code! It's better for everyone, and hey, messy code is better than no code.

8

28

293

Graham Neubig

@gneubig

6 years

Posted CMU "Neural Nets for NLP" lecture on sentence/contextualized word representations -- SkipThought, ELMo, BERT etc.: Attempted to make a systematic comparison along the model, training objective, and data dimensions, to bring method to the madness :).

0

86

283

Graham Neubig

@gneubig

8 years

Yesterday "attention is all you need" Today "you need a bunch of other stuff" Same authors😀.

3

99

280

Graham Neubig

@gneubig

7 months

One major weakness of open-source multimodal models was document and UI understanding. Not anymore! We trained a model on 7.3M web examples for grounding, OCR, and action outcome prediction, with great results. It's MultiUI, code/data/model are all open:

Xiang Yue

@xiangyue96

7 months

Working on multimodal instruction tuning and finding it hard to scale? Building Web/GUI agents but data is too narrow? .Introducing 🚀MultiUI: 7.3M multimodal instructions from 1M webpage UIs, offering diverse data to boost text-rich visual understanding. Key takeaways:

5

40

292

Graham Neubig

@gneubig

4 years

There has been much interest in ML methods that generate source code (e.g. Python) from English commands. But does this actually help software developers? We asked 31 developers to use a code generation plugin, and found some interesting results: 1/7

3

67

286

Graham Neubig

@gneubig

7 years

Introducing "Stack-pointer Networks", a top-down architecture for transition-based dependency parsing where each head "points" to its children using attention (#ACL2018) State-of-the-art on 21/29 tested datasets with code available:

2

85

283

Graham Neubig

@gneubig

1 year

Updates from the OpenDevin ( open AI software engineer this week.* A BrowserAgent that can effectively browse web sites.* Lots of work on building a comprehensive evaluation platform for agentic LLMs.* Lots of backend improvements to session management.🧵.

2

41

288

Graham Neubig

@gneubig

1 year

I'm in Austria for #ICLR2024! Please stop by poster 287 on WebArena ( this morning and chat about benchmarking or building web agents!

9

18

282

Graham Neubig

@gneubig

9 months

Excited to announce @allhands_ai has fundraised $5M to accelerate development of open-source AI agents for developers! I'm looking forward to further building out the software, the community, and making AI developers accessible for all 🚀.

All Hands AI

@allhands_ai

9 months

We are proud to announce that All Hands has raised $5M to build the world’s best software development agents, and do it in the open 🙌. Thank you to @MenloVentures and our wonderful slate of investors for believing in the mission!.

19

33

280

Graham Neubig

@gneubig

4 years

We've been on a multi-year effort to take steps towards understanding how well NLP/language tech serves people on a *global* scale. Here's a first report: We perform meta-analysis of performance across 7 tasks, and devise "global utility" metrics. 1/7

1

47

267

Graham Neubig

@gneubig

7 years

Just released "CoNaLa", a dataset/contest for broad-coverage generation of programs from English commands: 2,879 manually annotated examples, and 600k mined from StackOverflow to increase coverage; super-excited to bring NL->Code to the open domain!

1

109

266

Graham Neubig

@gneubig

1 year

Recently there were some great results from the new Mamba architecture ( by @_albertgu and @tri_dao. We did a bit of third-party validation, and.1. The results are reproducible.2. Mamba 2.8B is competitive w/ some 7B models (!).3. Mistral is still strong.

Alex

@a13xba

1 year

Since some of you might be wondering whether Mamba 2.8B can serve as a drop-in replacement of some of the larger models, we've compared the Mamba model family to some of the most popular 7B models in @try_zeno . Report: 🧵 1/5.

3

31

266

Graham Neubig

@gneubig

3 years

If you are applying to 🎓🤖grad programs in AI🤖🎓, here are three great resources:.1. Student perspectives on applications: 2. Example SoPs from recent applicants: 3. The CMU application mentorship program:

0

65

267

Graham Neubig

@gneubig

4 years

Super-excited for the official release of ExplainaBoard, a new concept in leaderboards for NLP: It covers *9* tasks with *7* functionalities to analyze, explore, and combine results. Please try it out, submit systems, and help improve evaluation for NLP!

Pengfei Liu

@stefan_fee

4 years

What's your system good/bad at? .Where can your model outperform others? .What are the mistakes that the top-10 systems make？ .We are always struggling with these questions. A new academic tool can help us answer them in a one-click fashion and many more:

3

74

261

Graham Neubig

@gneubig

3 years

Retrieval-based models are increasingly important in NLP/QA. But an important factor in modeling text is knowing *where* it came from. Our #ICLR2022 paper proposes retrieval-based LMs considers the "structural locality" of texts to improve retrieval: 🧵↓

3

39

258

Graham Neubig

@gneubig

2 years

Lecture slides for my talk at UIUC and UPenn on "Is my NLP model working? The answer is harder than you think.". I talk about state-of-the-art evaluation metrics for text generation, why they're important, and how you can use them to improve systems:

4

52

261

Graham Neubig

@gneubig

4 years

"Paraphrastic representations at scale" is a strong, blazing fast package for sentence embeddings by @johnwieting2. Paper: Code: Beats Sentence-BERT, LASER, USE on STS tasks, works multilingually, and is up to 6,000 times faster 😯

3

45

259

Graham Neubig

@gneubig

3 years

CMU 11-711 Advanced NLP has drawn to a close! You can now access all class materials online:.Slides: Videos: Hope it's useful, and stay tuned for "11-737 Multilingual NLP" next semester!.

Graham Neubig

@gneubig

4 years

In Fall 2021, CMU is updating its NLP curriculum, and 11-747 "Neural Networks for NLP" is being repurposed into 11-711 "Advanced NLP", the flagship research-based NLP class 😃.More NLP fundamentals, still neural network methods. Stay tuned! (CMU students, please register!).

3

55

255

Graham Neubig

@gneubig

6 years

Cross-lingual transfer is a powerful tool for low-resource NLP. But when you build a system for a new language (say Bengali), what language do you transfer from? Our #ACL2019 paper "Choosing Transfer Languages for Cross-lingual Learning" asks this: 1/7

3

64

248

Graham Neubig

@gneubig

8 years

Nice! Our paper on differentiable beam search (@kartik_goyal_, me, @redpony, and Taylor BK) was accepted to AAAI! Read to learn how to backprop through your search algorithm:

1

57

254

Graham Neubig

@gneubig

5 years

Next year I will be looking for 1-2 PhD students who are interested in doing deep and impactful work on NLP! (areas are open, but I like multilingual NLP/compling, natural language interfaces, ML for NLP). Please apply below and mention me in your app: 1/2.

5

86

246

Graham Neubig

@gneubig

5 years

Really happy our paper on Differentiable Data Selection will appear at #ICML2020! The method is a *principled* way to choose which data goes into models and it's super-broadly applicable. We've already used it in multilingual models at #acl2020nlp too

Xinyi Wang (Cindy)

@cindyxinyiwang

5 years

Not all training data are equal, but how to identify the good data efficiently at different stage of model training? We propose to train a data selection agent by up-weighting data that has similar gradient with the gradient of the dev set:

1

43

247

Graham Neubig

@gneubig

6 years

#ICLR2019 paper "Lagging Inference Networks and Posterior Collapse in VAEs". VAEs collapse to trivial solutions; we find this is because the inference network is poor at the beginning of training, then propose a simple solution of "aggressive update":

2

58

251

Graham Neubig

@gneubig

1 year

I'm looking for cost effective and simple ways to serve LLMs that we trained or fine tuned ourselves (7-70B range). What are the best options nowadays? (Self promotion welcome!).

37

23

253

Graham Neubig

@gneubig

7 years

Posted "Neural Lattice Language Models", our new paper (TACL) on LMs that calculate probability of a sentence by marginalizing over a lattice! It's a nice and flexible framework for LMs that lets you consider ambiguity such as word sense, segmentation, etc

0

84

244

Graham Neubig

@gneubig

4 months

Are you interested in getting started in research related to LLMs, agents, speech, safety, fairness, or other aspects of language technology?. At @LTIatCMU we're hosting an internship program for pre-doctoral students interested in these areas!

8

47

252

Graham Neubig

@gneubig

5 years

We have started uploading the lecture videos for CS11-737 to YouTube now! You can see the first two on the class intro, and typology.

Graham Neubig

@gneubig

5 years

Looking forward to our *brand new class*, CMU CS11-737 "Multilingual Natural Language Processing" this semester with Yulia Tsvetkov and Alan Black! We're covering the linguistics, modeling, and data that you need to build NLP systems in new languages: 1/2

6

71

244

Graham Neubig

@gneubig

8 months

One of my favorite things about using coding agents is that it's possible to do the things you wanted to do but never had time for. For me this is (1) making frontends, (2) doing little data visualizations. Here's a blog about frontend dev with agents:

4

32

245

Graham Neubig

@gneubig

1 year

With long-context LMs, we can now fit *thousands* of training examples in context!. We perform an in-depth exploration of many-shot in-context learning, finding it surprisingly effective, providing huge increases over few-shot prompting, and competitive with fine-tuning!.

Amanda Bertsch @NAACL

@abertsch72

1 year

In-context learning provides an LLM with a few examples to improve accuracy. But with long-context LLMs, we can now use *thousands* of examples in-context. We find that this long-context ICL paradigm is surprisingly effective– and differs in behavior from short-context ICL! 🧵

4

31

242

Graham Neubig

@gneubig

5 years

New #acl2020nlp paper on "Generalizing Natural Language Analysis through Span-relation Representations"! We show how to solve 10 very different natural language analysis tasks with a single general-purpose method -- span/relation representations! 1/

2

51

233

Graham Neubig

@gneubig

1 year

An interesting tidbit from the Mamba paper (: the Transformer vs. Transformer++ comparison. Transformer is the original version, and Transformer++ is the LLaMa-2 version (SwiGLU/RoPE/training tweaks). Architectures/algorithms make a huge difference!

4

35

236

Graham Neubig

@gneubig

8 months

I just did a data munging task for a research project in about 10 minutes with AI agents using OpenHands. Three months ago basically the same task took 2 PhD students and me several hours. AI is pretty clearly going to be revolutionary for science once everyone starts using it.

8

12

236

Graham Neubig

@gneubig

2 years

If you want to study NLP, LLMs, or broader language technology in grad school, please apply to @LTIatCMU! We have a great group of faculty covering many topics: I personally will be recruiting students on LLMs/agents/evaluation.

0

58

231

Graham Neubig

@gneubig

3 years

MEGA is a new method for modeling long sequences based on the surprisingly simple technique of taking the moving average of embeddings. Excellent results, outperforming strong competitors such as S4 on most tasks! Strongly recommend that you check it out:

Chunting Zhou

@violet_zct

3 years

I'm excited to share our work on a new sequence modeling architecture called Mega: Moving Average Equipped Gated Attention. Mega achieves SOTA results on multiple benchmarks, including NMT, Long Range Arena, language modeling, ImageNet and raw speech classification.

2

32

233

Graham Neubig

@gneubig

5 years

So I was browsing the results for the new Google chatbot Meena, and they look pretty OK (if boring sometimes). However, every once in a while it enters "scary sociopath mode," which is, shall we say, sub-optimal 😨

8

30

224

Graham Neubig

@gneubig

5 years

I was invited to give a talk to the New York Circle of Translators on machine translation and its implications for the practice of translation. This gave me a good opportunity to reflect on what MT currently can and can't do. See slides/comments here!

8

46

216

Graham Neubig

@gneubig

4 years

In Fall 2021, CMU is updating its NLP curriculum, and 11-747 "Neural Networks for NLP" is being repurposed into 11-711 "Advanced NLP", the flagship research-based NLP class 😃.More NLP fundamentals, still neural network methods. Stay tuned! (CMU students, please register!).

Graham Neubig

@gneubig

4 years

2021 version of CMU "Neural Networks for NLP" slides ( and videos ( are being posted in real time! Check it out for a comprehensive graduate-level class on NLP! New this year: assignment on implementing parts of your own NN toolkit.

1

19

222

Graham Neubig

@gneubig

5 years

This is a well-written overview! Also, for a higher-level, more philosophical take on search in generation models see my recent class (slides: video: . I discuss the relationship between model, search, and output quality.

Hugging Face

@huggingface

5 years

The 101 for text generation! 💪💪💪. This is an overview of the main decoding methods and how to use them super easily in Transformers with GPT2, XLNet, Bart, T5,. It includes greedy decoding, beam search, top-k/nucleus sampling,. : by @PatrickPlaten

0

48

223

Graham Neubig

@gneubig

6 years

#acl2019nlp paper on "Beyond BLEU: Training NMT with Semantic Similarity" by Wieting et al.: I like this because it shows 1) a nice use case for semantic similarity, 2) that we can/should optimize seq2seq models for something other than likelihood or BLEU!

5

58

220

Graham Neubig

@gneubig

1 year

Thanks to Devin for the contribution to OpenDevin!. It's great to see that even AI programmers believe in the power of open source 😃.

7

17

195

Graham Neubig

@gneubig

1 year

TL;DR on the results?. On all tasks, Gemini Pro achieved comparable but slightly lower accuracy than the current version of OpenAI's GPT 3.5 Turbo. Gemini and GPT were somewhat better than open-source contender Mixtral. But there’s quite a bit of nuance, let’s dig deeper…

2

32

162

Graham Neubig

@gneubig

4 years

Just released a new survey on prompting methods, which use language models to solve prediction tasks by providing them with a "prompt" like: "CMU is located in __". We worked really hard to make this well-organized and educational for both NLP experts and beginners, check it out!.

Pengfei Liu

@stefan_fee

4 years

What is prompt-based learning, and what challenges are there? Will it be a new paradigm or a way for human-PLMs communication? How does it connect with other research and how to position it in the evolution of the NLP research paradigm? We released a systematic survey and beyond

1

55

217

Graham Neubig

@gneubig

5 years

Excited to give a (virtual, recorded) talk about "The Low-resource NLP Toolbox, 2020 Version" at the AfricaNLP workshop at #ICLR2020! Slides: It's somewhat of a birds-eye view, but also focusing heavily on our work at @LTIatCMU.

2

44

215