Excited to introduce LEDITS++, a novel way to edit real images with precision ✏️
- Multiple edits ✂️🔁
- Automagic free masking 🪄🎭
- 🆕 DPM-Solver fast inversion 🔀⚡
🤗 Try it:
🔗 Project:
📝 Paper
I hacked
@huggingface
Spaces to build an open source
@gradio
Dreambooth Training UI that allows you to train a model for less than US$0.80 🐱💻 (you can also use it locally for free):
1 week of Stable Diffusion
A creative explosion is unfolding with Stable Diffusion,s showing the power of open source as state of the art!
We curated 23+ applications this week: new features, workflow integrations, UIs; run on Win, CPU, AMD, M1 and more!
After some, uh, developments yesterday:
- Stable Diffusion v1-5 is out by
@runwayml
- Fine-tuned image decoder (VAE) out by
@StabilityAI
Magic of open source🧙 collaboration continues no matter what, here's the Best Available Stable Diffusion™ notebook:
Very exciting 'breaking' news!
CompVis (research group behind VQGAN) have just released a new 1.45B parameter model to its Latent Diffusion model:
From the released image it seems like it has an unprecedented text-synthesis capacity. More to follow soon
Thanks
@angrypenguinPNG
for merging my PR to add high resolution to the Illusion Diffusion Space 📺🌀
It's now as fast, double the resolution and has crispy details - go play ▶️
Google just announced "Parti" - a text-to-image model co-developed with "Imagen"
"Parti" doesn't use diffusion models - rather it scales up Transformer + VQGAN architectures like DALL-E 1 and its open source replicas (dalle-pytorch, ruDALLE, DALL-E Mini)
ControlNet is cool, but what if you could have MORE control? 🤯
With MultiDiffusion Region Control you can 🎛️ draw masks ✏️ and give a specific prompt for each mask 📜
The
@gradio
demo is just out on
@huggingface
🤗 - kudos to the author
@omerbartal
!
Less than 1 minute guide on how to train your own LoRA with LoRA Ease 🧞♂️⚡
Train high-quality LoRAs on objects 📦, faces 😊, styles 🎨 or characters 🧑🎤 effortlessly and super cheap ༄
▶️
You can now finally create your own stock photo smiling while eating salad in seconds 👨🎤🥗
IP-Apdater-FaceID Plus was silently released last week - it's first inference technique time face really captures my likeness 🥸🦚
▶️
It's out! 🥳 Browse visually the Stable Diffusion Concepts Library - and use more than 100+ community taught concepts in your prompt directly on the same UI!
Colab with Gradio UI:
How to train your own ControlNet? 🥅
We wrote a guide, ranging from deciding which controls to use 🎛️, how to prepare your dataset, all the way to gpus going brrr 🔥
(with an unexpected trip to the uncanny valley 👀)
From me and
@pcuenq
with ❤️
The first large scale open source DALL-E 2 replication is here🧙
Karlo is an unCLIP model trained by
#KakaoBrain
I'm having fun playing with it on 🤗
@huggingface
Spaces:
Model card:
GitHub:
Introducing LoRA the Explorer 🔎: browse the coolest SDXL LoRAs, play with them online ▶️, use locally 💿
(...and no need to dodge semi-naked waifus 🚫)
Join the fun 🕺
The Stable Diffusion Multi Inpainting Spaces is out!
On it you can do both: Inpainting by masking the image (with the newest
@Gradio
masking) or inpainting with words, your choice!
I'm super thrilled to announce that our assemble of the Latent Diffusion LAION-400M text-to-image model is now available on
@huggingface
🤗, democratizing even further the access to text-to-image ai art!
Thank you for all the help
@osanseviero
!
I'm delighted to announce I've joined
@huggingface
as a ML Art Engineer 🤗, to help make AI art even more accessible, easy to use and to develop for!
This tech is going to empower human expression and creativity in unprecedented ways - and building it openly feels the right way!
Text-to-3D and Image-to-3D in 7 seconds 🤯 💨
That's LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation 🧊
And it's open source ✨
Try it ▶️
ControlNets are cool, but T2I-Adapters are 94% smaller 🤏 , and way faster 💨
Today TencentARC released 6 T2I Adapters for SDXL: depth, canny, lineart, openpose, and... DOODLY!
Come play:
Meta just released a new collection their open access "Seamless" translation models 🔊
They do speech-to-text, text-to-speech, speech-to-speech, text-to-text 💬🔄📝
The Expressive model keeps speech rate, pauses and style 🗣️
📁 Models and demos:
Collaborative new concepts on
#StableDiffusion
🎨
1. Teach Stable Diffusion new concepts 👩🏫(add to the public library if you wish):
(or browse the library to pick one🧤 )
2. Run with the learned concepts 🖼️
Happy Public Domain day! 🎉
To celebrate Steamboat Willie finally joining the public domain, I created a
@huggingface
dataset with all frames of the 1928 short 🐭📜
▶️
Ok - I just quickly assembled the LAION-400M trained Latent Diffusion CFG TTI model to a Google Colab, you can try it yourself:
"A mecha robot holding a sign that reads: 'This is weird'"
Very exciting 'breaking' news!
CompVis (research group behind VQGAN) have just released a new 1.45B parameter model to its Latent Diffusion model:
From the released image it seems like it has an unprecedented text-synthesis capacity. More to follow soon
🎅 Ho-ho-ho! Today a bunch of ICLR 2023 papers dropped! This is a conference with blind submission, authors are anonymous till review
A lot of multimodal AI: text-to-video (yes, another one), text-to-3D, another 'teach-diffusion-new-concepts', texto-to-audio... and more! 🧵
OPEN TO EVERYBODY!
I optimized the Latent Diffusion LAION-400M Colab RAM usage and now it should run on free non-Pro accounts. And fast!
8 images in 20 seconds on a P4 GPU
Google Drive support and VRAM optimizations by
@RiversHaveWings
were also added
Ok - I just quickly assembled the LAION-400M trained Latent Diffusion CFG TTI model to a Google Colab, you can try it yourself:
"A mecha robot holding a sign that reads: 'This is weird'"
Stable Diffusion model card is up, and the weights are available for academic and research purposes first
This is the first step ahead of a full public release which should be coming soon! 🤩
#StableDiffusion
Stable Video Diffusion is an amazing (and chonky 🐼) new model by
@StabilityAI
- if you can't run it locally, you can now play with it on
@huggingface
Spaces 🤗
▶️
This week's updates were not only made of Dall-E 2! We also got:
- Latent Diffusion LAION 400M (an open model!)
- KNN Diffusion paper (promising new approach to text-to-image)
- 3 new exciting TEXT-to-VIDEO models!
and more!
Check out our weekly update:
And the Spaces for the Stable Diffusion Concepts Library is out!
Navigate 250+ community taught object and styles with Textual Inversion and use them in your prompts!
DALL-E Flow is an awesome new tool by
@JinaAI_
's
@hxiao
Like Centipede Diffusion it is a mix of models:
It generates images from with DALL-E Mega, refines and creates variations with Latent Diffusion, ranks the best with CLIP and upscales the results
Following the full open source release of Stable Diffusion, the
@huggingface
Spaces for it is out🤗
Stable Diffusion is a state-of-the-art text-to-image model that was released today by
@StabilityAI
#stablediffusion
InstructPix2Pix by Tim Brooks allows you to write natural language instructions to edit images ✏️🖼️
We are getting closer and closer to "photoshop with words"! 🎨
Play with it now on
@huggingface
Spaces
Since VQGAN+CLIP times, we've been learning to prompt with
@openai
CLIP knowledge (incl. SDv1, conditioned on OAI CLIP)
Stable Diffusion 2 breaks that 💥 with LAION-trained CLIP, "trending on artstation", "greg rutkowski" don't work; we're all learning to prompt again! 👶
MindsEye - an open source interface to 'pilot' AI art models without using code - is now available to everyone
Check it out, share it around and let me know what you think!
Colab:
Discord:
Guide and FAQ:
Introducing Majesty Diffusion👑
Dango233 princesses were crowned queens, and Majesty Diffusion is born!
Two colabs are being released with new plenty of new features - but I need your help with one thing, come with me🧶
PAG (Perturbed-Attention Guidance) is not getting nearly the attention it deserves, I've adapted it to work on SDXL with diffusers 🧨
...and it DELIVERS! 🤯
Try it here ▶️
thanks to KU-CVLAB researchers: Donghoon Ahn Hyoungwon Cho et. al ❤️
Recent studies reveal that the quality of samples from diffusion models relies on techniques like CG and CFG, yet these fall short in unconditional generation and tasks like image restoration. This research paper introduces Perturbed-Attention Guidance (PAG), a novel method…
I've released MindsEye Lite👁️🧠: a UI that runs multiple text-to-image models without Colabs or logins - directly on Hugging Face Spaces
Run Diffusion, DALLE replicas, VQGAN+CLIP. Try it out and consider sending it to someone that tried used AI art yet!
🚨 A new text to image model by
@StabilityAI
is out!
It's Stable Cascade 💧 an iteration on the Würstchen architecture by
@dome_271
&
@pabloppp
I made a demo for it:
Introducing ✨ LoRA Studio ✨ a dedicated UI by
@enzostvs
for LoRAs hosted on
@huggingface
🤗 browse and generate images with fun models 🎉
(and safe models, no need to worry if your mom or your colleague enter the room while you are browsing 😳 🔞)
▶️
Have you tried OOT-Diffusion? 👕
A state of the art diffusion virtual try-on that just works with any person and any clothes ✨ - fully open source 💥
Official demo by Yuhao Xu:
▶️
This is fun! A new leap!
You show the model 3-5 images of what you want, it 'learns' what it is and now you use it on your prompts! And the approach is be pluggable to different models (here they applied it to Latent Diffusion)
Code is not yet out - excited for it!
An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion
abs:
“Textual Inversions”, operates by inverting the concepts into new pseudo-words within the textual embedding space of a pre-trained text-to-image model
Two days ago,
@stabilityai
quietly released CosXL and CosXL Edit, fine-tuned SDXL models that can produce full color range images ⬛⬜
You can now try them out on
@huggingface
! 🕹️
▶️
SDXL Lightning is a new distilled SDXL model by ByteDance: LCM+progressive distillation+adversarial objective ⚡️
They have a 1, 2, 4 and 8 step variations, below I'll test the prompt: "An unicorn plush toy on the beach" 🦄 for every step 🧵
(my favorite is 4 steps 🦶)
The
@huggingface
Hub now has `model templates`: instead of a blank `/new` page: a page tailored towards
uploading a specific kind of model 📙🎨
The first model template is one of the most requested: SD LoRAs! Share it with your fine-tuner friends 🤗
fast & longer text-to-video with 🧨 diffusers
you maybe saw fun junky text-to-video from the ModelScope's research model lately
with diffusers you can control how long the video is - and fit it on smol VRAM GPUs, including free colab. Try out here:
MindsEye - an open source interface to 'pilot' AI art models without using code - is now available to everyone
Check it out, share it around and let me know what you think!
Colab:
Discord:
Guide and FAQ:
Don't keep calm: the first Latent Consistency Model is out 🚀! It can generate good images in just 1, 2, 4, 8 blazing fast steps ⚡ (this video is not sped up)
(Distilled from a SD1.5 fine-tune in 32 A100/h) 💎
This week was🔥, we got:
DALL-E Flow
Dall-E Mega reaching 50% of training
Centipede Diffusion added inpainting
OpenCLIP LAION-400M ViT-B/16+ released
CLIP-Forge (a text-to-3D shape model)
and more!
Check it out in our multimodal ai art weekly news:
Würtschen: a new, trained from scratch high res (1024x1024) model by
@dome39931447
Inference is at a fraction of SDXL. And trained with 6x less compute than SD1.4
Quality trade-offs 🤔? Try it for yourself!
PS: this video is not sped up!
Introducing Face-to-All👨🎤, a diffusers 🧨 workflow inspired by
@fofrAI
amazing Face-to-Many ComfyUI workflow
Input a face, any style LoRA and get a stylized portrait
Colab with code:
Thanks
@Haofan_Wang
for merging our img2img pipeline to InstantID!
With the explosion
#StableDiffusion
use-cases 🖼️, it's impossible for 🧨 diffusers maintainers to keep up 🥵
But with your help they don't have to! With community pipelines, the community jumps in 🤝 implementing cool use-cases or papers
Check it out!
Introducing Grog 🖖
@Gradio
🤝
@replicate
's Cog
Grog automagically creates a Gradio UI for any Cog/Replicate 🪄
Use:
- Locally: UI and backend in your machine 🖥️
- Replicate API: local UI, API backend 🌐
- Dockerfile: cloud or
@huggingface
Spaces
Kind of stealthily Microsoft released "Improved VQ-Diffusion" - a follow-up on their technique that combines a VQ-VAE with diffusion
They released the code, the weights and a new VQ-VAE trained
I'm running the first experiments:
diffusers 🧨v0.4.0 introduces (among many other amazing features) negative prompts to Stable Diffusion!
Now you can, keeping the same seed, remove specific objects, colors or concepts from your output 🟪🏙
🧨diffusers 0.4.0 is out: Better, faster stronger!
🚀35% faster
#stablediffusion
in fp16
✨New scheduler API
➖negative prompts in
#stablediffusion
pipeline
🧹No more use_auth_token if you are logged in to
hub
🤩Community pipelines
More in release notes
Latent Majesty Diffusion 1.6, by
@dango233max
and me is out
New stuff: better defaults, fixed inpainting,
@laion_ai
models, defaults lib on
@huggingface
and way more
Pics: Hot Pot King, Lego Burger, Tamagotchi Ghost Wanderer, a business angel
Got DALL-E 3 via Bing, and there's a game-changer aspect that no one is talking about
prompt: "the line to the first feijoada restaurant in Tokyo" 🫘🗼
but do you see the 2nd line? It almost reads
"serving authentic brazil cuisine"
That's mind blowing! 🤯 Yup, Imagen, IF,…
This week on multimodal ai art news:
- Dall-E Mega sneak peak (try it yourself!)
- CLIP-GEN code released (further exploration needed)
- New StyleGAN XL 1024px model out
- Flamingo Visual Language Model announced
And more!
Check it out:
I saw a model that from a single image generated a long flythorugh video from it - but I don't rememebr the name and can't find it anymore, does anyone remember/have a name to it?
Google is soon releasing a suite of generative AI APIs and no/low code interfaces for text generation with PaLM
But besides the text generation aspects, it seems they are also sneaking in an Imagen service/API release!
We're kicking off the Control Stable Diffusion Community Sprint! 🧨
@googlecloud
is kindly providing
@huggingface
with free TPU-v4 for you to train ways to Control Stable Diffusion (with ControlNet or otherwise) 🚀
Join us here!
Ramping up the Stable Video Diffusion with 🧨 diffusers!
folks from
@diffuserslib
just merged compatibility with SVD - I updated my demo to use it - uses less VRAM 🤏, runs faster 🏃🏻⚡️ with torch.compile(), which means smaller queues for the demo
▶️
Sneak peak of MindEye Generator: a user interface to run multiple models (starting with Disco Diffusion) in one place
Platform agnostic, no need for a powerful GPU (as you're be able to execute it from a Google Colab)
Will be starting a beta-testing soon, stay tuned!
The Stable Diffusion Collaborative library of textual inversion trained concepts was offline for a bit after a member of the community thought it would be funny to delete it...
Now it is back online 🔥 & protected while still open for collaboration 🤗
New Stable Diffusion XL LoRA, Ikea Instructions. SDXL does an amazingly hilarious job at coming up with how to make things. Special thanks to
@multimodalart
and
@huggingface
for the GPU grant!!
HF ->
Civitai ->
GLID-3: mindblowingly good photorealistic images
CLIPMatrix: text-guided 3D mesh stylization
LAION 5B: a 5B image-text pair dataset
4 new CLIP-like models
That's just a small glimpse of what happened on the last 7 days! Check out our weekly update:
Last 2 weeks in multimodal AI art:
First big text-to-video model out (CogVideo), more out on text-to-3D, 'image editing with text' getting better - and a bunch of community trained diffusion models 🧵
Big release that got under the radar 📡📟: last week
@ml6team
released Fondant-25M: dataset of image-text pairs with a
@creativecommons
license
And that's just the tip: they are working on a 500M one 🤯
Blog:
Dataset on 🤗
Happy birthday
@ak92501
! Thank you for keeping us updated with the state of the art on many different fronts in the field of Machine Learning!
Your curation is amazing and became a de-facto benchmark in the industry!
Sidney can draw! 🖌️
I asked Bing chat/Sidney to illustrate the Waluigi Effect and Unaligned AGI as an SVG image
Rule: it can't use the <img> tag (otherwise it tries to cheat with a hallucinated URLs)
These are the things it came up with:
I'm having a lot of fun with the Pokemon Trainer Sprite LoRA by
@wizadwongsa
🐭🐦
It turns any prompt into a Pokemon trainer, any!
(can you guess them all?)
▶️
DALL-E 3 in one of my most challenging prompts
“The inauguration of a wormhole portal between Shanghai and New York. New York is inside the portal and Shanghai outside of it”
Introducing Face-to-All👨🎤, a diffusers 🧨 workflow inspired by
@fofrAI
amazing Face-to-Many ComfyUI workflow
Input a face, any style LoRA and get a stylized portrait
Colab with code:
Thanks
@Haofan_Wang
for merging our img2img pipeline to InstantID!
Now it's very easy to create a jupyterlab instance as a
@huggingface
Space 👽
It's free for CPU spaces, but if you have a cool GPU project (fine-tuning, testing), we provide grants ✍
▶️
Whoa, this is promising! Centipede Diffusion combines the our Latent Diffusion Colab and Disco Diffusion by
@gandamu_ml
,
@Somnai_dreams
and
@zippy731
in a clever way (basically Disco is used as an upscaler for Latent!)
Will try it and report back soon!
OpenAI on DALL-E 2: "We used advanced techniques to prevent photorealistic generations of real individuals’ faces, including those of public figures."
People on our Discord server with Majesty Diffusion: