wait so car keys don’t use asymmetric cryptography? you can unlock a car by just replaying the RF signals the key emits? my $20 raspberry pi zero has better security than my $20k car?
Releasing moondream2 - a small, open-source, vision language model designed to run efficiently on edge devices. Clocking in at 1.8B parameters, moondream requires less than 5GB of memory to run in 16 bit precision.
openai is the most talented and nicest group of people i have ever seen in one place
working on the hardest, most interesting, and most important problems
with all the key resources in place
extremely focused on making AGI
you should perhaps considering joining us
work has been interesting lately
got dinged for scheduling an all-hands meeting because the phrase “all-hands” is ableist (not a joke, DM for proof)
then my GPU instance order was rejected because there’s no capacity (my job is training ML models)
friday was my last day at AWS. I had a great 9 years and learned a lot but I’m excited to join the rest of society in complaining about AWS instead of defending it
ChatGPT refuses to solve CAPTCHA images, but luckily it's super easy to fine-tune moondream to do it. I just released a notebook showing how to do this.
@notmybagman
sorry but i will not be taking any complaints about ChatGPT's cooling water usage while we're still subsidizing cotton farming in the Arizona desert
there is a company out there that spent $1.4B training a model you’ve never heard of because it was so bad. they had 16 people working on just the tokenizer.
Implemented inference for the Mixtral 8x7B model. Requires ~100GB of VRAM, so you can definitely run it on an 8x3090 or 8x4090 instance.
(GitHub link in thread)
Releasing moondream0 today - a small vision language model based on SigLIP, Phi-1.5 and the LLaVa training dataset. This demo shows the model running purely on CPU using ~8GB of RAM.
can someone who is good with money help me balance my budget? i am currently funemployed and need to bring my burn rate down.
Rent ($1,800/mo) - $21,600
8xA100-40GB ($11/hr) - $96,624
Food ($600/mo) - $7,200
Annual Total - $125,424
it is with a heavy heart that i’m announcing shutting down all of my AI projects. will be focusing exclusively on linear algebra and stochastic differential equations going forward.
Cool paper - shows how to transfer knowledge from a teacher model to a student that is already pre-trained and may even outperform the teacher without loss of performance.
Overcomes shortcomings of traditional distillation techniques that assume the student is untrained.
ever felt an emptiness in the pit of your stomach? that could only go away if you had a dataset with 1.5M question/answer pairs about images? if so, i'm here to help.
getting a lot of DMs asking how to get into computer vision. i am no expert, i can only share what i did:
1. follow
@giffmana
2. read all of his papers
3. watch recordings of all of his talks on youtube
4. study every tweet he posts for extra alpha
> wake up
> new 2.7B model, nice
> wait it's actually 14B, 2.7B is "activated" but i still need all 14B in VRAM
> benchmarks compare it to a 7B model
> ???
what is the use-case for small/medium scale MoE models? why wouldn't you use a dense model instead? (serious question)
i have developed a new architecture that beats transformers on language modeling. i'm not going to release code, weights, or even a demo. you'll just have to trust me i uploaded a PDF to arxiv
@yifever
i applied a while ago and they ghosted me, which is not cool but ok you get a lot of applications understandable. but then they DM’d me after I released moondream asking if I was interested and then ghosted me again… wtf??
[on first date]
her: so, what are you passionate about?
me: i’m writing a 6,000 word essay on how MoE models are going through a hype cycle. they’re useful when serving at scale but open source research should focus on bringing back ReLU because — wait, where are you going?
I’ve been going through programming subreddits lately (looking for places to shill my AI code review product), and am starting to realize the future is not evenly distributed when it comes to AI-assisted programming.
Huge gap in the willingness to seriously try out new tools.
New moondream release out today! Mainly focused on improved OCR and captioning. If you're using moondream for image captioning definitely worth checking this one out!
Some notes on LLaVA-1.6:
1/ To increase image resolution without retraining the vision encoder, they feed in five crops of the image. This improves performance, but comes with additional computational cost due to increased image tokens (from 576 to 2144).
Just released a new revision of moondream2!
✅ Improved benchmark scores and instruction following
✅ Batch inference
✅ Support Flash Attention 2.0 for the text model
seeing moondream trending on github is the only thing that brings me out of my seasonal affective disorder fugue. thank you all for the support!
new improved version should be out later today!
ChatGPT refuses to solve CAPTCHA images, but luckily it's super easy to fine-tune moondream to do it. I just released a notebook showing how to do this.
people who ask how a scrappy startup can win if a big company decides to compete with you understand nothing about startups and big companies, and are fundamentally unserious
I hate the phrase “trivial to build” - it’s always said by someone who builds nothing. Building is hard. Building is expensive. Building is impossible. Anything that’s built is a miracle.
the fact that VCs think it’s clever to ask for your secret sauce in the first call when you know they're invested in a competitor is really the core of the innovation economy
@chinesegon
skeptical that anyone can live off investment returns with just $2M. assuming 8% returns and ignoring inflation that's $160K/yr. doesn't even cover my doordash bill. :(
i believe people are fundamentally good, and that AI tools should simply do what their users request instead of returning condescending responses about what's right or wrong
just to clarify, moondream2 is actually open source. apache 2.0. no weird non-standard licensing terms.
you can do whatever you want with it. it's probably already pre-approved by your company's legal department.
any seattle friends interested in building this drone and seeing if we can get it to fly with just vision input, instead of the usual accelerometer/gyro PID controller?
🥺👉👈
anthropic: “We believe that companies that train the best 2025/26 models will be too far ahead for anyone to catch up in subsequent cycles.”
also anthropic:
Why this works: for effective feature learning in neural networks using an Adam optimizer, learning rate needs to be inversely proportional to the width (a.k.a. model dimension) when your width is large.
(screenshot from Tensor Programs V)
you’d think this is the exact scenario where one would want a local model instead of calling OpenAI.
who wants their production line to grind to a halt because the factory’s internet connection was flaky?
With OpenAI, Figure 01 can now have full conversations with people
-OpenAI models provide high-level visual and language intelligence
-Figure neural networks deliver fast, low-level, dexterous robot actions
Everything in this video is a neural network:
very little alpha in reading arxiv papers these days because the best insights are kept proprietary. luckily there's still tons of alpha in reading soviet papers from the 1970s
never trust numbers in model names
claims to be 1.3B parameters? may actually have anywhere from 1.4B to 1.9B parameters
claims to take 384x384 images? the correct size is probably actually 378x378
We can say right now, with a high degree of scientific certainty, moondream3 is going to be a lot smarter than moondream2 and moondream4 will be a lot smarter than moondream3, we are not near the top of this curve.
went to an ai meetup today, all the questions were like “what’s the best way to get the gradient from the loss to the weights?” “how do i increase my network’s capacity?”
also saw
@santiagomedr
demo moondream running blazing fast on rust using
@huggingface
’s candle library
dario amodei wants me to delete this tweet because it discloses a compute multiplier, but i will not be silenced 😡
instead i will tell you that scaling by a factor of 4 instead of 8 will likely work even better
> go to SF because you’re only allowed to work on AI if you’re in SF
> RAG on the billboards
> RAG at every AI meetup
> someone broke into my car and left a flyer for their RAG company
> pay an extra 20% in taxes for the privilege
the xz vulnerability story is wild. they worked on the project for two years before injecting this attack. used sock puppets to pressure the previous maintainer into giving up control.
who has the resources to pull something like this off? what other projects may be compromised?
ceiling is being raised. cursor's copilot helped us write "superhuman code" for a critical feature. We can read this code, but VERY few engineers out there could write it from scratch.
Moondream now has bounding boxes!
@vikhyatk
has created a vision language model that is both powerful and efficient. AI Tinkerers SF (running locally on laptop)