Update: I left Meta yesterday. After 7.5 years.
I am sad, nervous, and excited.
Sad because I'll miss Meta! I've felt tremendously valued my entire time at Meta (first in FAIR and recently in GenAI). I'll miss the people and being in the thick of things.
Nervous because who in…
Very excited to introduce Humans of AI: Stories, Not Stats!
In this series, I interview AI researchers to get to know them better as people.
Starting next week, I will release two interviews every week as videos and podcast episodes. (Link 👇)
Introducing Make-A-Video3D! Generating 3D dynamic (mini) scenes from input text.
That is, text --> 4D!
Needs no 4D data (i.e., no dynamic 3D data), no static 3D data, no paired text-video data.
Paper:
Website:
Introducing AI Paygrades ()! Statistics of industry offers for AI jobs. The goal is to reduce information assymetry so candidates can make informed decisions and negotiate better. Submit your information and spread the word! With
@abhshkdz
.
Yes, it’s here :) Zero-shot text-to-video generation!
Introducing Make-A-Video.
The new SOTA, by a large margin, in text-to-video generation. And it doesn’t use any paired text-video data!
Examples, paper, and a sign up sheet at
@MetaAI
Prompts 👇
My first blog post! Several people have asked me for time management advice over the years. I was encouraged to believe that there is a good chance others might find the advice useful too :) Hence this post. Thoughts are welcome!
I have a system to plan writing papers for conference deadlines. My students and some collaborators know about it. With the ICLR 2020 deadline coming up, I thought this might be a good time to share this with a wider audience.
Nine sets of "two kangaroos busy cooking dinner in a kitchen" 🙂
Generated by Make-A-Video.
(Montage courtesy Yaniv; This kangaroo example had become our go-to example in the last few days to the deadline :))
#MetaAIMakes
I am looking for interns in the AI for Creativity space. If you are an ML/AI graduate student, have done some work in the past in AI for Creativity, and are interested in an internship at FAIR during Summer 2021, please get in touch with me.
NeurIPS 💯🎉!
“all come all served”
Registration is $25 for students and $100 for non-students.
Tutorials, keynotes and oral pre-recorded presentations accessible without a registration. Video recordings of the poster presentations will be released after the conference.
This is some advice I had shared with my lab on how to shorten your paper to fit the page limit. With the
#CVPR20
deadline coming up, I thought I'd share it widely.
Stefan Lee (
@stefmlee
), Dhruv Batra (
@DhruvBatraDB
), and I wrote up a blog post on some tips and principles we tend to follow when writing rebuttals. May be helpful for the upcoming
#ECCV2020
rebuttal deadline :)
Finally, a step towards generic vision+language models! One model that can answer questions, draw a box around an object described in a phrase, score an image-caption match, etc. 👆🏽performance on 12 datasets with 1/12th the parameters! SOTA on 7 of 12 datasets after fine-tuning.
AI augmenting human creativity is exciting! Generative models are attractive for that. But users need to have more control.
Make-A-Scene does just that (in addition to being SOTA)!
Also check out the video of a lovely story Oran wrote and illustrated with this approach! 👇
.
@DhruvBatraDB
and I got tenure! Thank you
@ICatGT
@gtcomputing
@mlatgt
. Most of all, thanks to our students, postdocs and research scientists in the CVMLP labs -- first at Virginia Tech, and now at Georgia Tech -- for all the wonderful work over the years! You make this home.
More Origami.
This one took me 12 hours to fold (including some debugging when things weren't quite lining up right towards the end :)).
May not have been the best idea to do it all in one day :)
Finished reviewing NeurIPS papers! Yay!
Exciting stuff!
A (hopefully useful) tip based on recurring frustrations I encountered: Make sure everything you say in the paper is completely understandable based on what has been said in the paper so far.
Couple of specific examples:
As ML roles grow, we need scalable ways to test candidates' practical ML skills even before interviews. (CS coding tests don't correlate well with ML skills.)
Introducing — create challenges, invite candidates, see how they do!
Would this be of interest?
How's this for a plan? We
—Do our jobs well
—Be reliable
—Execute (align our actions with our goals)
—Remember that the world is nuanced
—Assume good intent
—Listen (whether or not we think we're heard)
—Be kind
—Be less peaky in our priors so evidence can play a role
Happy 2019!
Presenting ViLBERT! It learns visiolinguistic representations that transfer well. SOTA on VQA, captioning, referring expressions, visual commonsense reasoning -- all with minor additions to the base architecture.
Work led by
@jiasenlu
and
@stefmlee
.
Doesn't seem like the best time, but even before all things 2020, I've been apprehensive about sharing this. So here we are :)
This is my data point as a woman in AI.
Any reactions, stories, perspectives, feedback, or questions are very welcome.
Yes, it’s here :) Zero-shot text-to-video generation!
Introducing Make-A-Video.
The new SOTA, by a large margin, in text-to-video generation. And it doesn’t use any paired text-video data!
Examples, paper, and a sign up sheet at
@MetaAI
Prompts 👇
New ML journal!
As Hugo says in the thread (and note the last one in particular)
- Uses OpenReview
- Focuses on conference-length publications
- Has no submission deadlines
- Aims for a fast turnaround
- Acceptance based on matched claims and evidence, not potential impact
Today,
@RaiaHadsell
,
@kchonyc
and I are happy to announce the creation of a new journal: Transaction on Machine Learning Research (TMLR)
Learn more in our post:
(I know the timing is not great, but this was recorded a couple of months ago.)
Episode 17 is out! Jeff Dean (
@JeffDean
) on Humans of AI: Stories, Not Stats.
Video:
Podcast:
All episodes so far:
Crowdsourced generative art gallery 👇🏽
This is art created and described by 66 anonymous individuals on Amazon Mechanical Turk. They used a “Create Your Own” tool from to make these. I'll post one ~every week.
#generativeart
#creativecoding
#crowdsourced
🎉🎉🎉 We gave artists and non-artists access to Make-A-Scene! Here's what they created and thought of it:
Make-A-Scene lets you sketch an image composition in addition to describing it, making it a more powerful tool for creative expression.
@MetaAI
All the talks are now online!
A big thank you to all the speakers for the amazing talks! Many attendees told us that they got a lot out of the talks and discussions, and many who missed the event have been pinging us for links to the slides and talks :)
In these interviews, we will try to see the human behind the work :)
We talk about who they are as a person, what their life is like, what they think about, are insecure about, get excited about. The story of their day-to-day life.
Stay tuned at !
So much relief all around. So much hope. So much positivity. It's been a while :)
It's crazy how inspiring just basic decency and coherence in thought feels...
🎉FAIR and Meta Open Arts in Times Square!🎉
Sofia Crespo's FAIR Artists in Residence project, Critically Extant, is being featured in Times Square Arts Midnight Moment every night this month.
@MetaAI
@soficrespo91
If you're in New York, go check it out 🙂
Very excited to announce Emu Edit and Emu Video!
Tell Emu Edit how you want an image edited and it will do precisely that.
Tell Emu Video what you want to see and it will generate a high quality video.
(Be sure to watch till the end!)
Links to a bunch of examples + papers👇
I don't like small talk. I like real connections. So I often "do questions" where everyone answers a question on the table. Meaningful discussions tend to follow.
It can be a struggle to think of questions though. So I made a website to help with that :)
Few things feel as good as (in no particular order):
1. A piece of your code doing what you want it to.
2. An empty inbox and to-do list.
3. High-bandwidth, successful communication of nuanced thoughts with a fellow human being.
Check out our demo of a single model for 8 different vision+language tasks! Give it an image + a question, it will answer it. Give it an image + a caption, it will score it. Give it an image + a phrase, it will draw a box around where that object is. Etc.
Humans of AI: Stories, Not Stats (at least this "season") is a wrap! I thoroughly enjoyed these 18 conversations. I hope you found these to be valuable. A huge thank you to all the 18 guests for taking the time to do these!
Anyone else jet lagged because of daylight savings? Anyone else can’t believe it is a thing to just move all clocks forward and backward twice a year?!
I started seeing some activity on this blog post -- so figured that's a good reminder to re-share this given the upcoming
#CVPR2021
rebuttal deadline :)
Keep in mind that this is just what we (
@stefmlee
,
@DhruvBatraDB
, and I) tend to follow, YMMV.
👋🏼 Emu
Turns out, a VERY small amount of EXTREMELY high quality fine-tuning data makes a HUGE difference in the quality of images generated using text-to-image models, without compromising on the generality of visual concepts they can depict.
Emu: Enhancing Image Generation Models Using Photogenic Needles in a Haystack
paper page:
Training text-to-image models with web scale image-text pairs enables the generation of a wide range of visual concepts from text. However, these pre-trained models…
A playground for audio-video-text understanding and generation — a good way to explore problems in multimodal AI without needing a ton of compute and humungous datasets!
Paper, data, code, baselines available
I built this simple drawing tool that enforces symmetry (on top of a sketching UI Larry Zitnick had built).
It is surprisingly fun :) Give it a shot! And of course, share anything you make :)
I'll start :)
Every day in 2022, I wrote down a couple of lines of salient things from the day. Every Sunday, pulled out a couple of lines of salient things from the week. Same for every month. It's a nice, grounded summary of 2022. Was hoping to get more out of it, but I recommend trying it.
There's been excitement around
#BigSleep
by
@advadnoun
-- CLIP + BigGAN to generate images that match a description.
I was curious how it handles other languages.
I tried "ek khoobsurat phool" -- Hindi for "a beautiful flower" and got this 😮
Link to colab notebook 👇
I realized that I hate doing things at the last minute partly because then later I *have* to do that thing that's due. I find that suffocating. I do things ahead of time so I can always do what I want. That is, ~ironically?, I do things ahead of time so I can be irresponsible.
#CVPR2020
is happening! I oscillate between "Argh, I miss the in-person energy and activities" and "Ah, it is so nice to have access to all this content + people from the comfort of my home, being able to see and hear everything clearly [i-am-short]".
How is it going for you?
Unit Origami
It was easier to fold than I thought it would be before I saw the instructions. Totally worth trying out (even with printer or notebook or magazine paper you might have lying around). Link to instruction video I followed 👇
For junior faculty, I talk about strategies for growing your lab, setting a culture, investing in compute, being intentional about how you spend time, being resourceful, maximizing sparks of joy, not getting too attached to your first batch of students :)
Wonder how to start your faculty career? grow in company? secure PhD offers? Check our
#ICCV
'21 workshop on Share Stories and Lessons Learned. As a warm-up, we have released some recorded talks from
@dimadamen
@deviparikh
@xinshuoweng
@zhoubolei
and a few live talks upcoming!
Faster, multi-GPU, multi-image-batched, PyTorch implementation of faster R-CNN. Gets ~36.7 mAP on COCO in ~36 hours on 8 Titan Xs! By
@jw2yang4ai
and
@jiasenlu
.
Introducing a Re-sliced version of Humans of AI: Stories, No Stats!
We are releasing videos that contain answers from all guests to the same question. All thanks to the efforts of
@VarshiniSubhash
and
@mkulkhanna
!
Answers to question 1 👉
Lana Lazebnik’s short course on “Computer Vision: Looking Back to Look Forward” from when she was visiting Georgia Tech.
Amazing resource! Will give you a fresh perspective over only reading papers from the last few years or months or weeks :)
Should we switch from “Can you see my screen?” followed by long pause till someone unmutes and answers to “You should be able to see my screen now. Let me know if not.” and continue talking?
Generative models are a class of models that model the distribution of data — p(x). It doesn’t comment on how the data is represented (pixels, language tokens, spatial image tokens, spatiotemporal video tokens, intermediate representations, etc.), or what the specific…
A fun project where Gunjan Aggarwal (
@gunjan050
) and I automatically generate (simple) music for an input dance!
Examples attached.
Paper:
Video of a live demo:
Very excited about our work on creative sketching!
Two datasets of ~10k sketches with part annotations
DoodlerGAN: A part-based GAN
(Super fun!) Web demo:
Paper + code:
Work led by Songwei Ge. With
@vedanujg
and Larry Zitnick.
Really good! My favorite (edited) snippet: lead or be led
If you are expecting to be told what to do, then someone will. It might not be the best thing to be doing.
Alternatively, if you show up with a convincing game plan, then people will get out of your way so you can do it.
Routine in India in the past few days: Wake up, email, one work to-do, breakfast, art to-dos (for upcoming project releases), lunch with family, art (
#genuary2022
), chit chat with family, dinner, chit chat with family, sleep. Repeat. Expecting >1 work to-dos starting tomorrow :)
Congratulations
@DhruvBatraDB
! 🎉🎉🎉
PECASE is the highest honor bestowed by the US Government to early-career scientists and engineers who show exceptional promise for leadership in science and technology.
SplitNet decouples perception and policy learning in visual navigation to allow for transfer across tasks and simulators (as a step towards sim2real transfer).
Video:
Code:
Paper:
A long video generation model that can train on clips that have 10s of frames but can generate videos that have >1000 frames.
Some specific technical details make this possible. Check it out!
Inspired by this, we converted two lab meetings to let's-teach-each-other-something-non-AI meetings. We covered Origami (theory and practice), Fountain Pens, Coffee, The Cup Song, and Games People Play In The Gym! It was awesome! You should try it in your circle of influence :)
Today we co-opted our regular group mtg into a🦃 teach-in, with drop-in cameos from alumni and friends all over.
Topics we taught us ranged from Queen's Gambit to Bob Ross painting to photography to crypto currency get-riches to a do-calculus tl;dr.
Happy Pandemic Thanksgiving!
Protip to
@CVPR
reviewers: Reviews are due Jan 4. You're not going to enjoy reviewing all papers over the weekend between Jan 2nd and 4th. So spread them out between now and when you plan on starting your winter break. You'll be at peace *and* the reviews will likely be better.
Cushions supports more than 5 billion possibilities🚀across 13 features. Curious to see which 200 will be revealed
@artblocks
on Jan 7th :)
Out of 1000 simulations, on average, two pieces have 4 out of the 13 features in common.
Ropsten mints 8, 17, 24, 44
#generativeart
Presenting ViLBERT! It learns visiolinguistic representations that transfer well. SOTA on VQA, captioning, referring expressions, visual commonsense reasoning -- all with minor additions to the base architecture.
Work led by
@jiasenlu
and
@stefmlee
.
A fun project where Songwei Ge and I looked at how well recent large-scale language and image generation models blend visual concepts e.g., "a moon that is sliced like an orange" or "a tree made of blue and red blood vessels".
I haven't found an app that comes close enough to this approach to time management. We should build one! Interested in building it? Fill this form out, and if we see a good fit, we'll reach out! We =
@abhshkdz
,
@DhruvBatraDB
, me.
Very excited to finally introduce the FAIR Artists in Residence program! What an honor and absolute pleasure to be working with these amazing AI artists -- Sofia Crespo (
@soficrespo91
), Scott Eaton (
@_ScottEaton_
) and Stephanie Dinkins!
Our
#AI
Artist in Residence program recently hosted three artists who use AI in their creative practices. As part of our AI for Creativity efforts, we celebrate the interdisciplinary collaboration between art and science using AI to enhance human creativity. Meet the artists:
I was asked to give a talk about my PhD workflow for the new students in our lab. So I put together a few slides about my approach to time management, preparing for
weekly progress meetings and staying on top of the literature. The slides are available at
I might be biased, but such a joy to read! Bonus: has pointers to interesting books, podcasts and people.
@abhshkdz
is defending in two weeks. It’s been inspiring working with him, and I hope to continue collaborating with him in one capacity or the other for a while to come!
ML
@GT
Ph.D. student Abhishek Das wants to stop
#climatechange
and develop
#AI
agents with human-level skillsets...and he's defending his dissertation soon! Get to know more about Abhishek in this month's edition of Meet ML
@GT
.
📝:
Super excited to announce Season 2 of Humans of AI: Stories, Not Stats! Hosted by
@DhruvBatraDB
this time :)
A big thank you to
@mkulkhanna
and
@VarshiniSubhash
for playing a huge role in putting Season 2 together!
I am excited to announce Season 2 of Humans of AI: Stories, Not Stats!
@deviparikh
created this series in 2020 and I am the host for Season 2, where I interview a cohort of 20 AI researchers to learn more about them.
Largest publicly available egocentric dataset and associated benchmark tasks!
Huge effort led by Kristen Grauman with a large team of collaborators.
Involved a consortium of 13 universities and labs across 9 countries, more than 2,200 hours of first-person video in the wild.
We’re announcing
#Ego4D
, an ambitious long-term project we’ve embarked on w/13 universities in 9 countries to advance first-person perception. This work will catalyze research to build more useful AI assistants, robots & other future innovations.