volokuleshov Profile Banner
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦ Profile
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦

@volokuleshov

Followers
9K
Following
2K
Media
456
Statuses
3K

Co-Founder @InceptionAILabs | Prof @Cornell & @Cornell_Tech | PhD @Stanford

Joined July 2013
Don't wanna be here? Send us removal request.
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
3 months
Excited to announce the first commercial-scale diffusion language model---Mercury Coder. Mercury runs at 1000 tokens/sec on Nvidia hardware while matching the performance of existing speed-optimized LLMs. Mercury introduces a new approach to language generation inspired by image.
@InceptionAILabs
Inception Labs
3 months
We are excited to introduce Mercury, the first commercial-grade diffusion large language model (dLLM)! dLLMs push the frontier of intelligence and speed with parallel, coarse-to-fine text generation.
31
34
390
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
2 years
This wild. Take MNIST, feed it pixel by pixel to an LLM, followed by the label (β€œx1=5, x2=9, …, y=3”). Fine tune on this dataset. This reaches 99% accuracy. Also works on other small datasets.
Tweet media one
29
106
1K
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
4 years
Did you ever want to learn more about machine learning in 2021? I'm excited to share the lecture videos and materials from my Applied Machine Learning course at @Cornell_Tech! We have 20+ lectures on ML algorithms and how to use them in practice. [1/5]
20
282
1K
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
1 year
It's crazy how many modern generative models are 15-year old Aapo Hyvarinen papers. Noise contrastive estimation => GANs.Score matching => diffusion.Ratio matching => discrete diffusion. If I were a student today, I'd carefully read Aapo's papers, they’re a gold mine of ideas.
Tweet media one
10
110
1K
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
2 years
Do you know what's cooler than running LLMs on consumer GPUs? Finetuning large 65B+ LLMs on consumer GPUs! πŸ€–. Check out my new side project: LLMTune. It can finetune 30B/65B LLAMA models on 24Gb/48Gb GPUs.
Tweet media one
16
160
738
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
6 years
Compressing MNIST into 10 synthetic images that give almost the same performance as the full dataset. Cool paper!
Tweet media one
7
167
648
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
3 years
Ok, I'm sorry, but this is just brilliant. Folks argue that AI can't make art, but look: (1) DALLE2 distills the essence of NY in a stunning & abstract way (2) each pic has unique visual language (bridge-loops in #1!?) (3) it *builds* (not copies) something new on top of Picasso!.
@hardmaru
hardmaru
3 years
Picasso minimal line art of New York City #dalle
Tweet media one
Tweet media two
Tweet media three
23
84
626
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
2 years
Excited to announce the newest update to the Cornell Open Applied ML course!. We are releasing 16 chapters of open online lecture notes covering topics across ML: neural networks, SVMs, gradient boosting, generative models, and much more.
Tweet media one
9
130
613
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
8 years
We are publishing online the notes for the Probabilistic Graphical Models course at Stanford (work in progress!)
Tweet media one
5
179
522
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
2 years
The award for the most clearly understandable poster goes to… this person!
Tweet media one
9
24
520
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
3 months
Mamba? DeltaNet? RWKV? RetNet? Here is a nice chart in which they’re all variants of the same formula.
Tweet media one
5
37
417
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
2 years
Here is an experiment: using ChatGPT to emulate a Jupyter notebook. You can even get it to run GPT inside ChatGPT. And you can also train neural networks from scratch inside ChatGPT.🀯. Here's walkthrough of how it works.
Tweet media one
9
54
407
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
1 year
ICLR decisions are now public, and it's confirmed that the recent (pretty high-profile) Mamba paper didn't get in. It's useful to read OpenReview to see how subjective the peer review process can be. The moral is: don't stress if your paper doesn't get in from the first try!
Tweet media one
7
37
337
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
7 years
Christopher Bishop is publishing a new textbook!
2
113
321
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
3 months
If you're at #AAAI2025, try to catch Cornell PhD student @yingheng_wang, who just presented a poster on Diffusion Variational Inference. The main idea is to use a diffusion model as a flexible variational posterior in variational inference (e.g., as the q(z|x) in a VAE) [1/3]
Tweet media one
3
22
332
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
2 years
Two-bit and three-bit LLMs are almost here! . QuiP yields the first usable two-bit LLMs and further reduces the costs of running LLMs on just one GPU. [1/4]. paper: code:
Tweet media one
4
84
315
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
7 years
I love this chart from the AI index report. You clearly see the peak of 80's AI hype and its slow drop-off. We seem to have just matched the conference attendance numbers from back then.
Tweet media one
8
156
280
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
3 years
Imagine you build an ML model with 80% accuracy. There are many things you can try next: collect data, create new features, increase dropout, tune the optimizer. How do you decide what to try next in a principled way?
Tweet media one
2
55
271
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
7 years
New paper with @ermonste on accurate uncertainties for Bayesian deep learning. Addresses model overconfidence which arises from misspecification and computational approximations. Will be presented at @icmlconf next week!
Tweet media one
Tweet media two
Tweet media three
Tweet media four
1
85
267
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
2 years
My weekend side project: MiniLLM, a minimal system for running modern LLMs on consumer GPUs✨. 🐦 Supports multiple LLMs (LLAMA, BLOOM, OPT).βš™οΈ Supports NVIDIA GPUs, not just Apple Silicon.πŸ§šβ€β™€οΈ Tiny, easy-to-use codebase in Python (<500 LOC).
Tweet media one
9
48
260
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
1 year
Excited to finally release our open course on deep generative models! This material has been taught at Stanford/Cornell/UCLA since 2019. It includes .πŸŽ₯ 20 hours of video lectures.✨ 17 sets of slides.πŸ“– Lecture notes. Youtube: Site:
Tweet media one
9
51
260
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
5 months
Neat and simple NeurIPS paper: take a pre-trained BERT and generate from it auto-regressively (sequentially unmasking the last token in context window). This does surprisingly well on benchmarks.
Tweet media one
4
16
253
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
4 years
As promised, here is a summary of @cornell_tech Applied ML 2021 Lecture 1: "What is ML?". The main idea is that machine learning is a form of programming, where you create software by specifying data and a learning algorithm instead of writing traditional code.
Tweet media one
2
34
235
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
8 years
Super resolution for audio. Our expanded paper is now on ArXiv + implementation + companion website:
Tweet media one
Tweet media two
2
84
228
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
3 years
What are the benefits of using deep learning in causal inference? Thoughts based on Monday morning's ICML tutorial on causality + my own opinions. πŸ‘‡. Slides are from the tutorial (link is below).
Tweet media one
5
26
226
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
1 year
Loved this nice and simple idea for better data selection in LMs. First, use high level features to describe high-value data (eg textbook chunks). Then use importance sampling to prioritize similar data in a large dataset. ⁦@sangmichaelxie⁩
Tweet media one
2
29
228
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
2 years
πŸ“’ Announcing the newest edition of the Cornell Tech open machine learning course!.- πŸ“Ί 30+ hours of lecture videos.- πŸ“• Lecture notes, slides, and code for 20+ lectures.- 🌎 A brand new website. Check it out here:
Tweet media one
4
67
213
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
2 months
Introducing test-time compute scaling for discrete diffusion LLMs! This almost matches AR LLM sample quality using diffusion. In traditional masked diffusion, once a token is unmasked, it can never be edited again, even if it contains an error. Our new remasking diffusion allows
Tweet media one
@Guanghan__Wang
Guanghan Wang
3 months
We are excited to present ReMasking Diffusion Models (ReMDM), a simple and general framework for improved sampling with masked discrete diffusion models that can benefit from scaled inference-time compute. Without any further training, ReMDM is able to enhance the sample quality
4
23
200
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
1 year
✨Simple masked diffusion language models (MDLM) match autoregressive transformer performance within 15% at GPT2 scale for the first time!. πŸ“˜Paper: πŸ’»Code: πŸ€–Model: 🌎Blog: [1/n]πŸ‘‡
Tweet media one
4
38
194
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
6 years
Another SOTA multi-task NLP result from MSFT. What is it about the Transformer architecture that is making these advances possible? Why have we not seen the same results with LSTM-based methods like Dai and Le (2015)?.
1
38
180
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
4 years
The slides and lectures notes that accompany my videos from the Applied Machine Learning course at Cornell are now available on Github! . I'm sharing 20+ Jupyter notebooks that you can compile into HTML, PDF, or execute directly.
Tweet media one
2
47
171
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
8 years
They claim that replacing the final softmax layer in a nn speeds up training by 33% on SOTA cifar-10 and ImageNet:
3
56
170
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
7 years
Slides from my talk at the @uai2018 workshop on uncertainty in deep learning. Thanks again to the organizers for inviting me!
Tweet media one
Tweet media two
Tweet media three
Tweet media four
3
40
165
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
2 years
Last April, we released libraries for 3-bit and 4-bit LLM inferencing & finetuning. We've had a lot of interest in our code (including 1k+ stars on Github), and we're now officially releasing the underlying algorithm: ModuLoRA. ModuLoRA is the first method to finetune 3-bit LLMs!
Tweet media one
6
31
164
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
1 year
It's surprising how little core ML innovation was needed to create Sora. As with GPT, the OAI team took a proven architecture (latent diffusion), and scaled it to massive data, with incredible results. Still, it's interesting to look at the details that OAI chose to reveal 1/πŸ‘‡
Tweet media one
3
15
159
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
8 years
New paper on black-box learning of undirected models using neural variational inference. Also speeds up sampling and helps estimate the partition function. Our #nips2017 paper is online here:
Tweet media one
Tweet media two
Tweet media three
Tweet media four
2
46
153
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
4 years
Update on my fall 2021 @cornell_tech applied ML course. Each week, I will be releasing all the slides, lecture notes, and course materials on Github, and I also plan to post summaries of each lecture on Twitter. Everybody is welcome to follow along!.
1
23
135
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
2 years
New update from the world of LLM quantization: QuIP will appear at NeurIPS2023, and our updated paper is now on the ArXiv. QuIP is the first method that gets useful results out of LLMs quantized using as little as 2 bits/weight. Camera-ready:
Tweet media one
8
24
133
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
2 years
And the prize for the most technologically sophisticated poster goes to… this author! Two built-in tabletsβ€”first time I see this at a conference.
Tweet media one
4
13
124
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
3 years
I really enjoyed the ICML tutorial on causality and fairness by Elias Bareinboim and Drago Plecko. My summary and some thoughts below πŸ‘‡
Tweet media one
1
17
121
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
1 year
✨Introducing diffusion with learned adaptive noise, a new state-of-the-art model for density estimation✨. Our key idea is to learn the diffusion process from data (instead of it being fixed). This yields a tighter ELBO, faster training, and more!. Paper:
Tweet media one
2
29
117
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
27 days
If you’re at #iclr2025, you should catch Cornell PhD student @SchiffYairβ€”check out his new paper that derives classifier-based and classifier-free guidance for discrete diffusion models.
Tweet media one
@SchiffYair
Yair Schiff
5 months
Diffusion models produce high quality outputs and have powerful guidance mechanisms. Recently, discrete diffusion has shown strong language modeling. ❓Can we adapt guidance mechanisms for discrete diffusion models to enable more controllable discrete generation? . 1/13 🧡
0
10
121
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
6 years
This is absolutely mind-blowing. Finding these kinds of disentangled representations has been a goal of generative modeling research for years. The fact that style-based GANs do it so well in a purely unsupervised way and without a strong inductive bias is just crazy.
@goodfellow_ian
Ian Goodfellow
6 years
An exciting property of style-based generators is that they have learned to do 3D viewpoint rotations around objects like cars. These kinds of meaningful latent interpolations show that the model has learned about the structure of the world.
3
26
113
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
6 years
I’d like to share a little update on what I’ve been up to in the past two years. πŸ™‚.
3
22
114
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
3 years
Cool work on semantic manipulations in diffusion models:
Tweet media one
2
13
105
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
5 months
Masked diffusion language modeling to appear at #NeurIPS2024. MDLM matches the quality of autoregressive transformers using diffusion! πŸ”₯. Also featuring new improvements since this summer: connections to score estimation, pre-trained model, full codebase πŸ‘‡.
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
1 year
✨Simple masked diffusion language models (MDLM) match autoregressive transformer performance within 15% at GPT2 scale for the first time!. πŸ“˜Paper: πŸ’»Code: πŸ€–Model: 🌎Blog: [1/n]πŸ‘‡
Tweet media one
1
16
107
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
6 years
New paper out in @NatureComms! GWASkb is a machine-compiled knowledge base of genetic studies that approaches the quality of human curation for the first time. With @ajratner @bradenjhancock @HazyResearch @yang_i_li et al. Last paper from the PhD years.
2
13
88
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
6 years
"A Guide to Deep Learning in Healthcare" out in @NatureMedicine this week! Joint work with amazing team of collaborators @AndreEsteva @AlexandreRbcqt @rbhar90 @SebastianThrun @JeffDean M DePristo, C. Cui, K. Chou, G. Corrado. Paper:
1
33
86
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
1 year
2-bit LLaMAs are here! πŸ¦™βœ¨. The new QuIP# ("quip-sharp") algorithm enables running the largest 70B models on consumer-level 24Gb GPUs with a only minimal drop in accuracy. Amazing work led by Cornell students @tsengalb99 @CheeJerry + colleagues @qingyao_sun @chrismdesa . [1/n].
2
20
81
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
7 years
Model weights can be quantized into a small # of bits during inference. If we're a bit clever, we can also directly train quantized weights! Experiments should be *much* faster. Or fit many quantized models, then fine-tune best one at full precision.
Tweet media one
Tweet media two
0
27
83
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
7 years
Wow, this year's Burning Man theme is Artificial Intelligence! The description talks about automation, loss of jobs, safety, etc. A mix of valid concerns and hype. In any case, it should be a lot fun.
Tweet media one
4
23
80
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
7 years
At #icml2018 in Stockholm this week! Come check our poster on improved uncertainty estimation for Bayesian models and deep neural networks on Thursday.
Tweet media one
1
23
83
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
3 months
Official results from @ArtificialAnlys benchmarking the speeds of Mercury Coder diffusion LLMs at >1,000 tok/sec on Nvidia GPUs. This is a big deal for LLMs: previously, achieving these speeds required specialized hardware. Diffusion is a "software" approach that gets to the.
@ArtificialAnlys
Artificial Analysis
3 months
Inception Labs has launched the first production-ready Diffusion LLMs. Mercury Coder Mini achieves >1,000 output tokens/s on coding tasks while running on NVIDIA H100s - over 5x faster than competitive autoregressive LLMs on similar hardware. Inception’s Diffusion LLMs (β€œdLLMs”)
Tweet media one
8
6
77
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
2 years
Love to see my machine learning lectures being shared online. Also, please stay tunedβ€”I will be announcing a major new update to my online class in the next few days.
@Jeande_d
Jean de Nyandwi
2 years
Applied Machine Learning - Cornell CS5785. "Starting from the very basics, covering all of the most important ML algorithms and how to apply them in practice. Executable Jupyter notebooks (and as slides)". 80 videos. Videos: Code:
Tweet media one
2
12
75
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
2 years
We're hosting a generative AI hackathon at @cornell_tech on 4/23! Join us to meet fellow researchers & hackers, listen to interesting talks, access models like GPT4, and build cool things. Sponsors: @openai @LererHippeau @AnthropicAI @BloombergBeta.RSVP
5
13
74
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
4 years
After a summer break, I’m getting back to tweeting again. This Fall, I’m teaching Applied ML at @cornell_tech in this huge room (and this time in person). Stay tuned for updates as I’ll be sharing a lot of the lecture videos and materials over the next few months!
Tweet media one
Tweet media two
2
1
73
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
2 years
Autumn gradient πŸƒπŸ‚πŸ
Tweet media one
2
2
71
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
7 years
If you didn't get an invite to the Elon Musk / Tesla party tonight, come check out our poster on Neural Variational Inference in Undirected Graphical Models at board #108 :)
Tweet media one
0
10
73
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
8 years
Benchmarking all the sklearn classifiers against each other. Gradient boosting comes out on top (confirming general wisdom)
@randal_olson
Randy Olson
8 years
Our preprint on a comprehensive benchmark of #sklearn is out. Some interesting findings! #MachineLearning #dataviz.
Tweet media one
1
19
68
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
8 years
Finally read this excellent paper. It shows problems where grads are the same for different models, hence GD fails
Tweet media one
0
15
67
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
2 years
One weird trick to scale your diffusion models to high resolution images: tune the noise schedule (and sprinkle in a bit of Google-level compute). High res diffusion without latent diffusion.
Tweet media one
2
7
65
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
8 years
Stanford researchers announced new 500TB Medical ImageNet dataset at #gtc. The project webpage is already up:
0
33
63
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
2 years
#Neurips2022 is now over---here is what I found exciting this year. Interesting trends include creative ML, diffusion models, language models, LLMs + RL, and some interesting theoretical work on conformal prediction, optimization, and more.
Tweet media one
1
4
65
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
2 years
First legitimate looking replication of the Meissner effect in LK-99. What a great day for science!
Tweet media one
0
13
62
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
2 years
How can deep learning be useful in causal inference? . In our #NeurIPS2022 paper, we argue that causal effect estimation can benefit from large amounts of unstructured "dark" data (images, sensor data) that can be leveraged via deep generative models to account for confounders.
Tweet media one
1
4
61
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
6 years
@NandoDF @ilyasut @icmlconf @iclr2019 Don't the benefits of increased reproducibility and rigor on the part of the authors greatly outweigh any potential misuses of their work, at least for the vast majority of ICML/ICLR papers? I think the current shift towards empirical work puts a greater need on releasing code.
4
1
57
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
7 years
Cool reddit thread on beautiful ML papers. but why are they all DL papers?? :) How about Wainwright and Jordan? Or the online learning stuff by Shalev-Schwartz? DL is awesome but, come on, that's not all there is to ML :) My vote goes to this paper:
@niru_m
Niru Maheswaranathan
7 years
This reddit thread on beautiful papers in machine learning is pretty great:
2
4
56
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
7 years
I thought the NIPS tutorial on deep learning on graphs was quite interesting. Uses spectral representations of graphs, which are really fascinating in themselves. Lots of potential scientific applications. I'd like to learn more about generative techniques.
1
11
58
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
2 years
Question to the panelβ€”what are the open problems in diffusion generative models? Top answerβ€”generalizing to discrete sequences and coming up with good corruption processes for that domain.
Tweet media one
2
5
55
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
7 years
A paper describing the DenseNet idea in 1988. Another example of researchers (independently) reinventing an idea from the last AI summer.
@rupspace
Rupesh Srivastava
7 years
Densenet from 1988. Found during skip connections lit review for my thesis, courtesy Faustino Gomez. Please cite if you use Densenets (1/2)
Tweet media one
1
26
55
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
2 years
The ICLR paper decisions haven’t even been made yet, yet text-to-3d models are already been deployed into a commercial app by Luma. What a crazy year! Shootout to @poolio for the original work on DreamFusion.
@LumaLabsAI
Luma AI
2 years
✨ Introducing Imagine 3D: a new way to create 3D with text!.Our mission is to build the next generation of 3D and Imagine will be a big part of it. Today Imagine is in early access and as we improve we will bring it to everyone
1
4
54
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
3 years
It’s awesome to be back at @cornell_tech! Kicking off the fall semester with some stunning views of NYC.
Tweet media one
Tweet media two
Tweet media three
2
1
52
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
8 years
My take on Wasserstein GANs: slower training, convergence metric works, not always more stable. TF/Keras code here:
1
13
54
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
7 years
As far as I can tell (and I might be wrong), Google Research was just renamed Google AI. What about all the work in systems, crypto, econ. ? It just seems wrong to call all of Google's CS research "AI".
5
9
48
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
7 years
This talk at the NIPS creative ML workshop was amazing. One of the best results on music generation with RNNs that I've seen so far.
@ada_rob
Adam Roberts
7 years
Excited to share my newest work #MusicVAE for interpolating and sampling melodies, beats, and three-part song segments from a VAE! Listen to samples and create your own in the #colab notebook (link in YT description) w/ @jesseengel @deck #magenta #nips2017.
1
9
51
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
1 year
Here are 9 predictions for AI in 2024 πŸŽ‰πŸŽŠ. 1️⃣ Planning will take a greater role in generative modeling. Models will increasingly β€œthink” at inference time, trading off compute for output quality. In many applications (β€œgenerate a good molecule”), this will make a ton of sense.
4
7
50
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
8 years
This blew my mind. GAN-style models for identifying causal mutations in a GWAS with adjustment for population-based confounders.
@dustinvtran
Dustin Tran
8 years
Dave Blei and I are excited to share β€œImplicit causal models for genome-wide association studies”
0
12
46
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
3 years
Rally in support of Ukraine now in Times Square and for the next several hours. Another one starts at 2pm in Greenwich village. One show your support.
Tweet media one
0
13
46
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
6 years
Earlier this month at #icml2019, we presented new work which examines the question of what uncertainties are needed in model-based RL. Taking inspiration from early work on scoring rules in statistics, we argue that uncertainties in RL must be *calibrated*. [1/10]
Tweet media one
1
8
49
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
2 years
As usual, the @agi_house hackathon doesn’t disappoint! Demos are going to start soon. 🦾
Tweet media one
Tweet media two
Tweet media three
2
3
49
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
3 years
War in Ukraine, Day 2 (Feb 25). I will be summarizing key events of the day based on what I hear from friends on the ground and based on reports from western and Ukrainian media.
Tweet media one
1
3
44
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
2 years
πŸ“’ One of my students, Phil Si, is applying for PhD programs in this cycle. Phil's ICLR paper on quantile flows is an exciting improvement over neural autoregressive flows, which makes them applicable to not just to density estimation, but also to generation. A short thread 🧡
Tweet media one
1
6
41
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
7 years
After Google and Baidu, Facebook is also publishing a neural text-to-speech system. Like Deep Speech 3 (and Lyrebird's unpublished demo), it quickly generalizes to new speakers. This research area is really interesting.
Tweet media one
Tweet media two
0
14
45
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
2 years
Outstanding paper talk by @poolio on DreamFusion. Use existing text to image generative models to train text to NERF models. Get a 3d render of a squirrel on a motorbike. #iclr2023
Tweet media one
1
3
42
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
1 year
Check out our new blog post on on finetuning LLMs quantized in 2-bits using Modulora. Unlike QLoRA, Modulora works with any modern quantizer like Quip# or OPTQ, and can outperform QLoRA on downstream tasks with 2x smaller models.
Tweet media one
1
16
44
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
1 year
How far can you push LLM quantization without hurting performance on downstream tasks?. We could go pretty far by combining a state-of-the-art quantizer with Modulora finetuning. On some tasks, our carefully finetuned 2-bit LLMs outperform existing 8-bit LLMs. A short thread πŸ‘‡
Tweet media one
1
13
42
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
1 month
Going to #ICLR2025? Come talk to me and the Inception Labs team about our next-generation Mercury diffusion LLMs!. I'll be there April 24-28 ✈️.
@InceptionAILabs
Inception Labs
1 month
We will be at #ICLR2025 this week! DM us if you're interested in connecting and learning more about our dLLMs. (We're hiring!).
2
1
41
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
7 years
Explaining some of the behavior of deep neural nets using older results from the 90s developed for boosting, support vector machines, etc. Cool stuff:
Tweet media one
1
5
37
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
6 years
And the new world record of information content per character in a tweet goes to. .
@karpathy
Andrej Karpathy
6 years
Some ways of combining information in two branch of a net A & B: 1) A+B, 2) A*B, 3) concat [A,B], 4) LSTM-style tanh(A) * sigmoid(B), 5) hypernetwork-style convolve A with weights w = f(B), 6) hypernetwork-style batch-norm A with \gamma, \beta = f(B), 7) A soft attend to B, . ?.
0
2
40
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
2 years
In Hawaii for #ICML2023!
Tweet media one
3
1
36
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
2 years
Diffusion models produce great samples, but they lack a semantically meaningful latent space like in a VAE or a GAN. We augment diffusion with low-dimensional latents that can be used for image manipulation, interpolation, controlled generation. #ICML2023.
@SkyLi0n
Aaron Gokaslan
2 years
Thrilled to share our latest paper - Infomax #Diffusion! We're pushing the boundaries of standard diffusion models by unsupervised learning of a concise, interpretable latent space. Enjoy fun latent space editing techniques just like GANs/VAEs! Details:
Tweet media one
1
5
38
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
24 days
Watch the Mercury diffusion LLM in action πŸ§‘β€πŸ’». In this video, we vibe code a ViT in PyTorch using VS Code + Continue. The model nearly instantaneously generates big blocks of code based on user instructions.
3
8
38
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
1 year
If you’re at ICLR, you should try to catch Oscar (JHU undergrad), who is presenting his really cool TMLR paper on Modulora, 2bit finetuning of LLMs. Paper:
Tweet media one
1
7
38
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
2 years
City lights on a Saturday night ✨
Tweet media one
Tweet media two
Tweet media three
3
1
35
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
1 year
Did you know that word2vec was rejected at the first ICLR (when it was still a workshop)? Don’t get discouraged by the peer review process: the best ideas ultimately get the recognition they deserve.
@AravSrinivas
Aravind Srinivas
1 year
Tomas Mikolov, the OG and inventor of word2vec, gives this thoughts on the test of time award, and the current state.of NLP, and chatGPT. 🍿
Tweet media one
1
2
32
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
5 years
Excited to share that @afreshai raised another $12M to advance our mission of using AI to reduce waste across the food supply chain. We're always looking for talented folks who are passionate about AI, food, and the environment to join our growing team.
3
3
34
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
2 years
As the generative AI hackathon and post-hackathon events come to an end, I want to again thank everyone who attended! . Incredibly grateful to @davederiso @agihouse_org for organizing the event!. πŸ™ to @LererHippeau & the NYC VC community for sponsoring it
Tweet media one
2
6
34
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
3 months
Are Technica wrote a short piece about our Mercury diffusion language models
2
4
35
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
28 days
And the best poster award goes to… πŸŒˆπŸ¦„
Tweet media one
Tweet media two
1
0
35
@volokuleshov
Volodymyr Kuleshov πŸ‡ΊπŸ‡¦
8 years
Wow, "Understanding deep learning requires rethinking generalization" got perfect 10.0 reviews at ICLR (best score)!
1
10
33