Jay Alammar Profile
Jay Alammar

@JayAlammar

Followers
35,468
Following
1,260
Media
479
Statuses
1,773

Machine learning and language models R&D. Builder. Writer. Visualizing AI, ML, and LLMs one concept at a time. @Cohere .

Joined April 2020
Don't wanna be here? Send us removal request.
Explore trending content on Musk Viewer
Pinned Tweet
@JayAlammar
Jay Alammar
2 months
And here you have it! The cover for Hands-On Large Language Models. And the animal is: *drum roll* The Red Kangaroo! Why the Red Kangaroo? The process of choosing cover animals is a closely guarded secret held deep within the legendary halls of @OReillyMedia . @MaartenGr and I
Tweet media one
14
57
467
@JayAlammar
Jay Alammar
4 years
How GPT3 works. A visual thread. A trained language model generates text. We can optionally pass it some text as input, which influences its output. The output is generated from what the model "learned" during its training period where it scanned vast amounts of text. 1/n
31
778
2K
@JayAlammar
Jay Alammar
2 years
The Illustrated Stable Diffusion New post! Over 30 visuals explaining how Stable Diffusion works (diffusion, latent diffusion, CLIP, and a lot more).
Tweet media one
31
500
2K
@JayAlammar
Jay Alammar
2 years
pip install scikit-learn It's easy to take for granted, but this single command gives you functionality I'd value at hundreds of thousands of dollars, if not more. Not to mention amazing documentation that beautifully weaves guides and references. Hats off to @scikit_learn
16
135
1K
@JayAlammar
Jay Alammar
2 years
A 🧵looking at DeepMind's Retro Transformer, which at 7.5B parameters is on par with GPT3 and models 25X its size in knowledge-intensive tasks. A big moment for Large Language Models (LLMs) for reasons I'll mention in this thread.
Tweet media one
10
197
966
@JayAlammar
Jay Alammar
3 years
Presenting the Explainable AI Cheat Sheet: Video: Cheat Sheet: A high-level map to major categories of ML Explainability. Informed by excellent work by @ChristophMolnar @IAugenstein @sameer_ and others. Plenty of links!
Tweet media one
6
203
815
@JayAlammar
Jay Alammar
2 years
AI image generation is the most recent mind-blowing AI capability. #StableDiffusion is a clear milestone in this development because it made a high-performance model available to the masses. This is how it works. 1/n
Tweet media one
13
146
695
@JayAlammar
Jay Alammar
2 years
The Illustrated Retrieval Transformer New post! A visual look at language models that perform on par with GPT3 at 4% of the size.
Tweet media one
5
127
628
@JayAlammar
Jay Alammar
3 years
Interfaces for Explaining Transformer Language Models A new blog post (with interactive explorables) to make transformers more transparent. It shows input saliency for generated text, and (VASTLY more interesting) neuron activations
4
154
628
@JayAlammar
Jay Alammar
8 months
What makes LLM tokenizers different from each other? GPT4 vs. FlanT5 Vs. Starcoder Vs. BERT and more Tokenizers are one of the key components of Large Language Models (LLMs). One of the best ways to understand what they do is to compare the behavior of different tokenizers. In
14
70
547
@JayAlammar
Jay Alammar
9 months
Our new short course, “Large Language Models with Semantic Search" is now live! In it, you'll learn how to use LLMs to build the next generation of search systems using concepts like embedding and reranking. Hope you enjoy it! What an incredible honor to
@AndrewYNg
Andrew Ng
9 months
We just released "Large Language Models with Semantic Search”, built with @cohere , and taught by @JayAlammar and @SerranoAcademy . Search is a key part of many applications. Say, you need to retrieve documents or products in response to a user query; how can LLMs help? You’ll
37
610
3K
15
71
474
@JayAlammar
Jay Alammar
2 years
How awesome are those visuals in @pandas_dev Getting Started Who did this? Seriously, kudos!
3
86
467
@JayAlammar
Jay Alammar
29 days
If the rise of LLMs caught you by surprise, here's your chance to get a preview of what's likely to be the next monumental jump in AI capabilities: LLM-backed agents that use software tools In this video, I'll walk you through the concepts and code of building an LLM-backed
5
100
465
@JayAlammar
Jay Alammar
3 years
Probing Classifiers: A Gentle Intro (Explainable AI for Deep Learning) New video! Probing Classifiers are an Explainable AI tool used to make sense of the representations that deep neural networks learn for their inputs.
Tweet media one
2
92
450
@JayAlammar
Jay Alammar
2 years
Finetuning Text Embedding Models Achieving peak performance in tasks like text classification and semantic search often requires finetuning an embedding model. This is one of the key intuitions one needs to build when using Large Language Models.
Tweet media one
3
63
440
@JayAlammar
Jay Alammar
2 years
We're launching @CohereAI Sandbox – open-source libraries to help developers experiment with language AI I've been working on topic modeling using LLMs: -1-
Tweet media one
4
83
438
@JayAlammar
Jay Alammar
2 years
Intro to Basic Semantic Search A gentle guide to building simple semantic search features that go beyond keyword search. Uses sentence embeddings and Annoy to build a "similar questions" feature.
Tweet media one
2
81
433
@JayAlammar
Jay Alammar
3 years
Ecco – See what your NLP language model is “thinking” Ecstatic to release my first open-source project! Interactive visualizations in jupyter for @huggingface GPT2-based language models. Github: HN:
7
97
423
@JayAlammar
Jay Alammar
1 year
Big update to "The Illustrated Stable Diffusion" post 14 new and updated visuals. The biggest update is that forward diffusion is more precisely explained -- not as a process of steps (that are easy to confuse with de-noising steps). -1-
Tweet media one
6
80
412
@JayAlammar
Jay Alammar
4 months
From the various tools that enable building solutions with large language models (LLMs), DSPy stands out to me as one of the most promising tools for building LLM pipelines. I got to speak to @lateinteraction and ask him to introduce DSPy and what he envisions for its future.
6
39
402
@JayAlammar
Jay Alammar
10 months
ChatGPT has Never Seen a Single Word (Despite Reading All of The Internet). Glance at LLM Tokenizers. New Video! It's fascinating that the actual input to language models is not exactly the text we pass them! Learn more about tokenizers, a key component of LLMs. Link in reply
Tweet media one
10
70
404
@JayAlammar
Jay Alammar
2 years
When training binary classifiers in @PyTorch , make sure to use the correct binary loss for your network structure. BCEWithLogitsLoss improves numeric stability, but make sure you pass the actual logit output because it will apply the sigmoid itself.
Tweet media one
2
47
395
@JayAlammar
Jay Alammar
4 years
How GPT3 Works - Visualizations and Animations A compilation of my threads explaining GPT3. I'll still post early drafts here on Twitter, but that post is the proper & final home for them all. 1/n of the second thread
1
112
379
@JayAlammar
Jay Alammar
2 years
The Illustrated Retrieval Transformer New video! Language models are improved by giving them the ability to query a database or search the web for information. Here's a look at one way of doing that.
Tweet media one
1
65
355
@JayAlammar
Jay Alammar
6 months
Tokenizers, and self-attention both lie at the heart of the LLM boom. Learn about them and more in the most recent post on the newsletter. LLM Tokenizers, Semantic Search Course, And Book Update #2 The update on attention is a teaser to a chapter
Tweet media one
2
46
355
@JayAlammar
Jay Alammar
2 years
Software is eating the world. Machine learning is eating software. Transformers are eating machine learning. Oversimplifications, to be sure, but this trail of utility to economic value is evident and we don't yet understand how drastically it will shift economic value. 1/n
@jordiae
jordiae
2 years
AlphaStar (2019) vs. Gato (2022) architectures:
Tweet media one
Tweet media two
19
198
1K
5
50
328
@JayAlammar
Jay Alammar
2 years
Intro to Large Language Models with Cohere A high-level look at large language models and some of their applications for language processing. It covers text generation models (like GPT) and representation models (like BERT).
Tweet media one
2
73
332
@JayAlammar
Jay Alammar
3 years
Inspecting Neural Networks with Canonical Correlation Analysis - A gentle Intro New Video! Methods like CKA, PWCCA, and SVCCA serve as similarity measures revealing to us insights into how a neural network processes its inputs.
Tweet media one
4
71
328
@JayAlammar
Jay Alammar
1 year
Despite the Generative AI craze, one of the most exciting and reliably useful areas of AI is not generative at all. It is search. Learn about Neural Search from @Nils_Reimers , creator of Sentence Transformers, and @CohereAI director of ML/embeddings
Tweet media one
6
69
324
@JayAlammar
Jay Alammar
3 years
Einsum is a key method in summing and multiplying tensors. It's implemented in @numpy_team , @TensorFlow , AND @PyTorch . Here's a visual intro to Einstein summation functions. 1/n
Tweet media one
2
56
317
@JayAlammar
Jay Alammar
4 years
How GPT-3 Works - Easily Explained with Animations A gentle and visual look at how the API/model works under the hood -- including how the model is trained, and how it calculates its predictions. New Video!
6
69
316
@JayAlammar
Jay Alammar
3 years
This Intro to Deep Unsupervised Learning is excellent. It's presented by Alec Radford, the first author of papers including GPT, GPT2, DCGAN, and CLIP. Covers word2vec, Glove, RNNs, ELMo, BERT, T5, Electra, and more.
2
51
318
@JayAlammar
Jay Alammar
2 years
A Visual Guide to Prompt Engineering Large GPT language models are rising in prominence as language processing and generation tools. They can write, paraphrase, and summarize, but they can also classify. This is a gentle starting guide to prompts.
Tweet media one
5
67
309
@JayAlammar
Jay Alammar
4 years
How does BERT answer questions? In this explorable, @betty_v_a shows how the layers of BERT successively mutate the representations of input words (question and context) so the correct answer ("bathroom") ends up isolated enough for the model to pick
1
75
308
@JayAlammar
Jay Alammar
1 year
Scatterplots are amazing for exploration. We use them all the time for text (using embeddings). It's the first time I get to explore a music scatter plot -- each point is 3 seconds of music. Fascinating work by @philtgun at
8
43
298
@JayAlammar
Jay Alammar
2 years
Entity Extraction with Large Language Models In this article and notebook, @nickfrosst and I walk you through extracting movie names from r/movies posts using a generative language model.
Tweet media one
2
50
293
@JayAlammar
Jay Alammar
2 years
So many exciting things happening in ML these days. DeepMind's Gato is the direction I'm excited about the most. One small-ish model that learns text, images, playing video games, robotic sensors and control. Everything is a sequence! Let's work out how: 1/n
@NandoDF
Nando de Freitas 🏳️‍🌈
2 years
Two years in the making by a talented, collaborative, and fun team, and with enormous help and support from many others at @DeepMind . No better place to be! Congrats @scott_e_reed on this step.
18
30
352
7
47
289
@JayAlammar
Jay Alammar
6 months
The next generation of RAG applications will 1) include a query rewriting step 2) provide citations for its sources. This is an incredible visual guide on how to build it end-to-end. Colab:
@cohere
cohere
6 months
The Chat endpoint with RAG is easy to use, but it's also customizable. In document mode, the endpoint is highly modular. In this LLM University chapter, learn how to build a RAG-powered chatbot with the Chat, Embed, and Rerank endpoints.
1
27
131
6
61
292
@JayAlammar
Jay Alammar
1 year
New model alert! @CohereAI 's new embedding model supports 100+ languages and delivers 3X better performance than existing open-source models. See the post by @Nils_Reimers and @amrmkayid :
Tweet media one
4
38
280
@JayAlammar
Jay Alammar
8 months
I'm writing an updated version of The Illustrated Transformer for the upcoming Hands-On LLMs book I'm co-writing with @MaartenGr . What updates/developments in the past 5 years do you feel should be a definitive addition to an intro to the architecture? Lots of additions to
20
34
272
@JayAlammar
Jay Alammar
5 months
I caught up with @abertsch72 at #NeurIPS2023 , who was presenting Unlimiformer, a retrieval-augmentation method for encoder-decoder models allowing unlimited length inputs. Paper: Unlimiformer: Long-Range Transformers with Unlimited Length Input Work with @urialon1 @gneubig , and
2
44
264
@JayAlammar
Jay Alammar
3 years
Language Processing with BERT: The 3 Minute Intro (Deep learning for NLP) New video! A brief and highly accessible intro to BERT, where you have used it, and the various applications it powers.
Tweet media one
2
48
258
@JayAlammar
Jay Alammar
2 months
This week, we launched Command-R, which crowns @cohere 's stack of RAG-optimized language models. Join me in any of these upcoming dates as I break down this advanced-RAG, multilingual stack: March 12, 3PM: #SXSW2024 , Austin, Texas. Hacks / Hackers (Sign up:
Tweet media one
5
23
238
@JayAlammar
Jay Alammar
4 years
I'm still in awe of and how it visually explains statistical concepts in an interactive manner.
5
50
245
@JayAlammar
Jay Alammar
2 years
Large Language Models for Real-World Applications - A Gentle Intro My talk from @PyData London is now on online! It covers three top LLM use cases we see at @CohereAI (classification, semantic search, text generation). Here are the five main slides:
Tweet media one
2
47
239
@JayAlammar
Jay Alammar
1 year
Remaking Old Computer Graphics With AI Image Generation New post! I take Dream Studio, Midjourney, and DALL-E for a test drive: recreating an old video game cinematic. In the end, I share my current impression of these services.
Tweet media one
8
36
242
@JayAlammar
Jay Alammar
3 years
Ecstatic to see "Machine learning research communication via illustrated and interactive web articles" published at @rethinkmlpapers workshop at #ICLR2021 In it, I describe my workflow for communicating ML to millions of readers. Paper: 1/5
Tweet media one
6
48
235
@JayAlammar
Jay Alammar
4 years
Just published! My "Visual Intro to Machine Learning and Deep Learning" talk at QCon 2020. A gentle intro to ML for software engineers where I go over 10 foundational concepts, 4 applications, and 3 tools to get you started on your journey.
Tweet media one
4
50
229
@JayAlammar
Jay Alammar
4 years
The Narrated Transformer Language Model A new video! A high-level overview of transformer language models. It addresses both the transformer architecture and language modeling (as that makes a simpler intro than machine translation)
6
44
229
@JayAlammar
Jay Alammar
3 years
A Gentle Intro to Transformer language models and how makes them more transparent My talk at @PydataKhobar is now live! Thanks to the organizers. Colab:
3
38
221
@JayAlammar
Jay Alammar
3 years
Seeing Voices: 1 - Intro to Spectrograms New video! I have been captivated with this method that visualizes sound. It's used in ML for speech recognition, but is also opening the door to better understand animal communication and intelligence.
Tweet media one
4
37
222
@JayAlammar
Jay Alammar
2 years
If you're a visual learner, be sure to check out @MeorAmer1 's Visual Intro to Deep Learning. Meor's ability to create visual language explaining ML concepts is absolutely remarkable.
1
34
219
@JayAlammar
Jay Alammar
5 months
Awesome poster presentation by @Muennighoff for the paper "Scaling Data-Constrained Language Models" at #NeurIPS2023 Kudos @srush_nlp @boazbaraktcs @Fluke_Ellington @olapiktus @Nouamanetazi Sampo Pyysalo @Thom_Wolf @colinraffel
2
29
220
@JayAlammar
Jay Alammar
2 years
Hats off to @psuraj28 @pcuenq @natolambert @PatrickPlaten for this great writeup explaining how Stable Diffusion works. The most helpful for me so far. @AICoffeeBreak video is also great
1
45
215
@JayAlammar
Jay Alammar
1 year
I'm going to be honest. I hyperventilated a little when I saw this dataset internally. All of Wikipedia. Embedded. Passage by passage. Not only English, but 9 other languages as well. Ecstatic to get to put it in your hands
@cohere
cohere
1 year
What could you build if you had the embeddings of ALL of wikipedia? The Embedding Archives: Millions of Wikipedia Article Embeddings in Many Languages We’re publishing ~100 million embedding vectors, covering Wikipedia in 10 languages. Get them now!
187
824
5K
7
30
213
@JayAlammar
Jay Alammar
1 year
Language Models and Machine Learning: What a Time for Language Models
9
55
212
@JayAlammar
Jay Alammar
2 years
Ecco v0.1.0 is out! Massive update. - Support for T5, T0, DeBERTa, and ability to add other/local models - Feature attribution via Integrated Gradients and many other methods - Support for Beam Search generation
5
34
211
@JayAlammar
Jay Alammar
11 months
Good morning #ACL2023NLP ! Excited for my first ACL since Gathertown. Would love to say hi if you're here! I'll be tweeting my experience in this thread over the next few days.
Tweet media one
6
9
212
@JayAlammar
Jay Alammar
1 month
AI Agents will take the abilities of LLMs to a whole new level. Here's how to build a simple agent that can use software tools like searching the web or writing and running python code (LLMs love to write @matplotlib code for you).
Tweet media one
@cohere
cohere
1 month
Automate your enterprise workflows with Cohere's multi-step tool use. Our generative model Command R+ excels at leveraging external tools to execute complex tasks to streamline business operations. Get started today!
0
15
73
4
37
212
@JayAlammar
Jay Alammar
3 years
Behavioral Testing of ML Models (Unit tests for machine learning) New video! Creating unit tests for ML models gives us higher resolution understanding of model performance -- allowing us to better compare models and observe degradation.
Tweet media one
4
48
204
@JayAlammar
Jay Alammar
2 years
Applying massive language models in the real world with @CohereAI This is a round up of some of my recent writings and collaborations on applying large language models at Cohere. They contain a bunch of intuitions for problem solving with LLMs.
Tweet media one
Tweet media two
Tweet media three
Tweet media four
0
43
202
@JayAlammar
Jay Alammar
1 year
What's the big deal with Generative AI? Is it the future or the present? New post! This is part 1 of reflections on how best to think of the current state of AI products and features, & avoid pitfalls people tend to make with new tech. Four main points:
Tweet media one
8
43
203
@JayAlammar
Jay Alammar
5 months
The scale of #NeurIPS2023 is staggering. This is a look at just one of the poster sessions. If only AI could help us explore / understand / browse / better search all this knowledge..
9
25
202
@JayAlammar
Jay Alammar
8 months
Let's look at different tokenizers in action -- explaining so much of how a LLM "sees" text. New Video! (link in response) We have carefully crafted a piece of text that reveals so much about how a LLM parses its input. We pass it to BERT, GPT4, GPT2, Galactica, Starcoder,
Tweet media one
3
39
202
@JayAlammar
Jay Alammar
5 months
LLM-backed agents have been some of the most futuristic LLM directions in 2023. The Voyager paper, presented here by coauthor @yuqi_xie5 at a #neurips2023 workshop, was certainly one of the most fascinating. With the right framing, a (text+code only) LLM can successfully
3
32
201
@JayAlammar
Jay Alammar
3 years
Oversimplified example of self-attention, the concept behind a lot of the current progress in AI/ML. Say a model needs to process the sentence: "A robot must obey the orders given 𝗶𝘁 by human beings" Self-attention helps the model resolve which word "𝗶𝘁" refers to.
Tweet media one
5
21
191
@JayAlammar
Jay Alammar
3 years
Favorite AI/ML Books: Intro to ML with Python (Book Review) New Video! I go over the awesome "Intro to ML with Python" by @amuellerml and Sarah Guido. A book that helped me understand many applied ML methods.
Tweet media one
4
37
188
@JayAlammar
Jay Alammar
3 years
What are inductive biases? Can models make different predictions when trained on the same data? @RTomMcCoy distills the concept incredibly well in this one graphic. More: Video:
Tweet media one
2
38
182
@JayAlammar
Jay Alammar
2 years
Top-k and Top-p are key parameters for controlling the output of GPT models. They are two possible decoding strategies (Or let's call them 'token picking methods') This is a visual look at how they work as the last step in GPT text generation.
Tweet media one
2
34
183
@JayAlammar
Jay Alammar
1 month
LLMs are finally breaking free from short context lengths using methods like Ring Attention. Don't miss this visual explainer by @khshind @simonguozirui @bonniesjli
@khshind
Kilian Haefeli
1 month
How do state-of-the-art LLMs like Gemini 1.5 and Claude 3 scale to long context windows beyond 1M tokens? Well, Ring Attention by @haoliuhl presents a way to split attention calculation across GPUs while hiding the communication overhead in a ring, enabling zero overhead scaling
4
69
315
3
28
187
@JayAlammar
Jay Alammar
4 years
Jay's Visual Intro to AI I made a video introducing AI and some of its key business applications. I talk about the motivation of using AI, and the simple trick that lies at the heart of the majority of AI/ML applications in the real world.
5
57
183
@JayAlammar
Jay Alammar
4 years
I like this graphic from a @huggingface notebook on tokenization (). It shows three tokenization schemes with examples, and how vocabulary size increases across different schemes. GPT's tokenization is similar to the one in the middle.
Tweet media one
1
44
184
@JayAlammar
Jay Alammar
8 months
One of the best investments you can make in your AI Engineering skillset is to be comfortable with the ideas of using language models for search. In "Using LLMs for Search with Dense Retrieval and Reranking", @SerranoAcademy and I give you the key intuitions for building this
Tweet media one
@cohere
cohere
8 months
In our latest blog post, Cohere's Head of Developer Relations @SerranoAcademy and Engineering Director @JayAlammar provide a comprehensive overview of how to use LLMs to power state-of-the-art search.
93
150
1K
2
34
186
@JayAlammar
Jay Alammar
3 months
Guess the animal on the cover of our upcoming Hands-On Large Language Models book for a chance to win a free copy! There's a secret method that assigns the animals of @OReillyMedia books. Even @MaartenGr and I as authors don't even know what the animal is until it is assigned.
Tweet media one
138
14
180
@JayAlammar
Jay Alammar
3 years
Favorite python books: Effective Python New video! I go over @haxor 's excellent advanced python book with recommendations on how to make your code more pythonic.
3
21
175
@JayAlammar
Jay Alammar
3 years
Finding the Words to Say: Hidden State Visualizations for Language Models New post! Visualizations glancing at the "thought process" of language models & how it evolves between layers. Builds on awesome work by @nostalgebraist @lena_voita @tallinzen . 1/n
Tweet media one
1
38
174
@JayAlammar
Jay Alammar
3 months
AI Agents are some of the most drastic technological changes on the horizon. I asked CMU professor @gneubig about how best to define the current crop of AI agents and where he sees them going. Links to our full conversation are in a reply. We discussed LLM evaluations, new
3
30
164
@JayAlammar
Jay Alammar
3 years
So many fascinating ideas at yesterday's #blackboxNLP workshop at #emnlp2020 . Too many bookmarked papers. Some takeaways: 1- There's more room to adopt input saliency methods in NLP. With Grad*input and Integrated Gradients being key gradient-based methods.
Tweet media one
Tweet media two
2
37
156
@JayAlammar
Jay Alammar
3 years
Self-attention is an important component of the transformer, but not the only one. Some might misunderstand "Attention is all you need" to mean that all the key computation happens in attention layers. In reality, it's more like "Attention can replace recurrence/convolutions"
3
13
160
@JayAlammar
Jay Alammar
2 years
Combing For Insight in 10,000 Hacker News Posts With Text Clustering New blog post! I embedded and clustered the top HN posts looking for insight on personal/career development. I built an interactive map and found ~700 posts that fit the bill. 1/n
7
40
158
@JayAlammar
Jay Alammar
1 year
AI Art Explained: How AI Generates Images New video! If you want to know how AI generation works and how it's trained, this video is for you! With tens of original figures explaining the internal mechanics of diffusion models.
Tweet media one
6
37
160
@JayAlammar
Jay Alammar
3 years
I just learned that the creator of the excellent sklearn cheat sheet is @amuellerml . This comes a day after I shot a video about his excellent ML Intro book which REALLY helped me learn ML when I started out. Technical communication wizard. Coming up next on the YouTube channel
Tweet media one
Tweet media two
1
22
153
@JayAlammar
Jay Alammar
3 years
The covariance matrix is a an essential tool for analyzing relationship in data. In numpy, you can use the np.cov() function to calculate it (). Here's a shot at visualizing the elements of the covariance matrix and what they mean: 1/5
Tweet media one
2
22
156
@JayAlammar
Jay Alammar
3 years
If you're curious how Github Copilot works, this is a gentle intro to GPT3 (the ancestor of Codex, which powers Copilot)
@JayAlammar
Jay Alammar
4 years
How GPT3 works. A visual thread. A trained language model generates text. We can optionally pass it some text as input, which influences its output. The output is generated from what the model "learned" during its training period where it scanned vast amounts of text. 1/n
31
778
2K
4
36
151
@JayAlammar
Jay Alammar
2 years
A Generalist Agent (Gato) - DeepMind's single model learns 600+ tasks New video! Gato's tokenization method maps tasks from text, vision, and control to token sequences learned by a single 1.18B param GPT model.
Tweet media one
3
35
153
@JayAlammar
Jay Alammar
4 years
On the transformer side of #acl2020nlp , three works stood out to me as relevant if you've followed the Illustrated Transformer/BERT series on my blog: 1- SpanBERT 2- BART 3- Quantifying Attention Flow (1/n)
2
25
149
@JayAlammar
Jay Alammar
2 years
I had the pleasure of hosting @MaartenGr to speak about BERTopic, and discuss topic modeling, visualization, API design, modularity, and other topics. Watch it now! Episode #1 of Talking Language AI: Overview blogpost:
4
37
148
@JayAlammar
Jay Alammar
3 years
Ecstatic and honored that was published as an #ACL2021NLP demo paper! Ecco: An Open Source Library for the Explainability of Transformer Language Models v0.0.15 is out now!
Tweet media one
1
25
148
@JayAlammar
Jay Alammar
2 years
A language model thinks this Dune review is negative: "I have a well-documented weakness for sci-fi and expected Dune to feed my soul. I didn't expect it to entirely blow my mind." Which input words lead to this prediction? These. Darker is more important.
Tweet media one
6
16
142
@JayAlammar
Jay Alammar
3 years
Be sure to check the awesome NLP course by @lena_voita . It's highly visual, well animated, and even has interactive explorables (scroll down to 'Sampling with temperature' in to get the intuition for the 'temperature' parameter in language models).
@alexip
Alexis Perrier
3 years
Just stumbled upon this fantastic NLP course by @lena_voita Includes embeddings, language modeling, Seq2seq and Attention and more
0
21
101
1
28
141
@JayAlammar
Jay Alammar
2 months
LLM Developers loved Command R, some called it the RAG King, well, hang on till you meet Command R+. Out now! Open weights. Much, much more capable: - Multi-hop RAG: It takes RAG capabilities to a whole new level, when dealing with complex questions, it’s able to search for
@aidangomez
Aidan Gomez
2 months
⌘R+ Welcoming Command R+, our latest model focused on scalability, RAG, and Tool Use. Like last time, we're releasing the weights for research use, we hope they're useful to everyone!
26
190
984
2
21
140
@JayAlammar
Jay Alammar
4 years
I've been enjoying learning the Trax deep learning library (). I've created an intro notebook to the Transformer Language Model (on which GPT is based): It's a great way to start learning how transformer models are built. 1/n
Tweet media one
1
28
137
@JayAlammar
Jay Alammar
3 years
We live in an AWESOME age of enlightenment. Oh, you wanna learn about SVD? Have @luis_likes_math break it down for you[1]. Or have @3blue1brown show you how to bend space with your mind (& linear algebra) [2]. Or just attend the whole MIT course [3]. We're incredibly blessed.
6
13
138
@JayAlammar
Jay Alammar
3 years
The Unreasonable Effectiveness of RNNs (Article and Visualization Commentary) New Video! I comment on one of my favorite ML articles which helped me break into ML and NLP. We take a look at its visualizations of neuron firings.
2
13
131
@JayAlammar
Jay Alammar
10 months
Good morning #ACL2023NLP day #3 ! I'll be sharing more notes from the conference in this thread, but also.. POSTER PRESENTATION VIDEOS! If you're here, stop by the @cohere & @forai_ml booth and say hello to @max_nlp @Nils_Reimers @PSH_Lewis @sarahookr @SerranoAcademy
Tweet media one
3
13
131
@JayAlammar
Jay Alammar
2 years
Great break down of TF-IDF by @c_brinton and @davidinouye1
Tweet media one
Tweet media two
Tweet media three
4
19
129
@JayAlammar
Jay Alammar
5 months
Hello #NeurIPS2023 ! Looking forward to meeting everybody. Drop by booth 1109 and meet @cohere and @CohereForAI folks and discuss everyone's work ! [noticing @CShorten30 and @ecardenas300 walk-by in the end and, on cue, say hi]
5
14
129
@JayAlammar
Jay Alammar
1 year
#EMNLP2022 here we go!
Tweet media one
2
13
126