vik Profile Banner
vik Profile
vik

@vikhyatk

Followers
7,371
Following
530
Media
664
Statuses
3,861

teaching computers how to see // prev: @awscloud

Seattle
Joined November 2008
Don't wanna be here? Send us removal request.
Explore trending content on Musk Viewer
Pinned Tweet
@vikhyatk
vik
17 days
this is a real, non-cherry-picked sample from my lnqa visual question answering dataset
Tweet media one
4
0
55
@vikhyatk
vik
3 months
wait so car keys don’t use asymmetric cryptography? you can unlock a car by just replaying the RF signals the key emits? my $20 raspberry pi zero has better security than my $20k car?
143
177
4K
@vikhyatk
vik
7 months
> doesn't say please and thank you to ChatGPT 🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩
89
268
2K
@vikhyatk
vik
2 months
Releasing moondream2 - a small, open-source, vision language model designed to run efficiently on edge devices. Clocking in at 1.8B parameters, moondream requires less than 5GB of memory to run in 16 bit precision.
Tweet media one
75
210
2K
@vikhyatk
vik
3 months
“you should perhaps consider joining us” while also having the most dysfunctional recruiting organization i’ve had the misfortune of interacting with
@sama
Sam Altman
3 months
openai is the most talented and nicest group of people i have ever seen in one place working on the hardest, most interesting, and most important problems with all the key resources in place extremely focused on making AGI you should perhaps considering joining us
2K
1K
23K
15
13
1K
@vikhyatk
vik
4 months
when she realizes your app runs a local vision language model instead of calling the GPT-4V api
Tweet media one
16
40
963
@vikhyatk
vik
4 months
Announcing moondream1: a tiny 1.6B parameter vision language model that punches above its weight
31
121
919
@vikhyatk
vik
6 months
work has been interesting lately got dinged for scheduling an all-hands meeting because the phrase “all-hands” is ableist (not a joke, DM for proof) then my GPU instance order was rejected because there’s no capacity (my job is training ML models)
Tweet media one
50
18
884
@vikhyatk
vik
2 months
the fifth law of thermodynamics states that mark zuckerberg always wins.
Tweet media one
15
48
884
@vikhyatk
vik
3 months
friday was my last day at AWS. I had a great 9 years and learned a lot but I’m excited to join the rest of society in complaining about AWS instead of defending it
27
16
842
@vikhyatk
vik
2 months
1,857,482,608 floating point numbers. 3.72 GB. Training this model cost more than my house. Excited to share it with everyone next week. :)
Tweet media one
21
31
819
@vikhyatk
vik
5 months
looking into buying hardware so i can run mixtral locally and cancel my $20/mo ChatGPT subscription. looks like i can expect to break even in 83 years
16
24
752
@vikhyatk
vik
7 months
@PicturesFoIder ask if he wants to pet my dog
Tweet media one
23
4
742
@vikhyatk
vik
4 months
it's time to build * * an analytics pipeline for an internal tool that will be deprecated in six months at your dystopian corporate job
9
44
727
@vikhyatk
vik
2 months
ChatGPT refuses to solve CAPTCHA images, but luckily it's super easy to fine-tune moondream to do it. I just released a notebook showing how to do this.
Tweet media one
18
50
715
@vikhyatk
vik
10 months
@ceadreams @latkedelrey i'll take 300 wishes so there's a 500% chance
6
1
681
@vikhyatk
vik
5 months
@notmybagman sorry but i will not be taking any complaints about ChatGPT's cooling water usage while we're still subsidizing cotton farming in the Arizona desert
17
5
591
@vikhyatk
vik
2 months
me explaining to my parents that I quit my sr management faang job to train open source models and shitpost on the internet
Tweet media one
6
19
555
@vikhyatk
vik
12 days
there is a company out there that spent $1.4B training a model you’ve never heard of because it was so bad. they had 16 people working on just the tokenizer.
18
21
545
@vikhyatk
vik
5 months
Implemented inference for the Mixtral 8x7B model. Requires ~100GB of VRAM, so you can definitely run it on an 8x3090 or 8x4090 instance. (GitHub link in thread)
Tweet media one
15
29
449
@vikhyatk
vik
16 days
if you are using LoRA: divide the A matrix learning rate by 8 and multiply the B matrix learning rate by 8. you can thank me later
Tweet media one
12
44
445
@vikhyatk
vik
19 days
i am going to make a GUI that looks like this to monitor my data processing pipelines
Tweet media one
46
12
419
@vikhyatk
vik
3 months
@glennchrpntr a problem for when i make enough money to have a garage...
1
1
422
@vikhyatk
vik
4 months
Created a Huggingface Space for moondream1 to make it easier to try out!
33
55
374
@vikhyatk
vik
4 months
Releasing moondream0 today - a small vision language model based on SigLIP, Phi-1.5 and the LLaVa training dataset. This demo shows the model running purely on CPU using ~8GB of RAM.
19
55
372
@vikhyatk
vik
2 months
can someone who is good with money help me balance my budget? i am currently funemployed and need to bring my burn rate down. Rent ($1,800/mo) - $21,600 8xA100-40GB ($11/hr) - $96,624 Food ($600/mo) - $7,200 Annual Total - $125,424
45
10
362
@vikhyatk
vik
6 months
it is with a heavy heart that i’m announcing shutting down all of my AI projects. will be focusing exclusively on linear algebra and stochastic differential equations going forward.
10
17
354
@vikhyatk
vik
3 months
@realfastman and now you have to leave your car unlocked so they don’t break the windows, so i guess it still doesn’t matter
5
0
346
@vikhyatk
vik
11 months
@realism_fan it was cotton eyed joe wasn't it? where did he come from? where did he go?
2
2
334
@vikhyatk
vik
3 months
moondream1 inference is now available in @huggingface transformers!
Tweet media one
10
36
325
@vikhyatk
vik
2 months
the model is learning
Tweet media one
11
10
314
@vikhyatk
vik
2 months
They raised $1.3B at a $4B valuation less than a year ago.
Tweet media one
13
11
305
@vikhyatk
vik
3 months
mouser has ~1500 of the coral micro dev boards in stock now - comes with a camera, mic, and edge TPU. i got one for… uh, no reason in particular 😉
Tweet media one
20
8
299
@vikhyatk
vik
7 months
Cool paper - shows how to transfer knowledge from a teacher model to a student that is already pre-trained and may even outperform the teacher without loss of performance. Overcomes shortcomings of traditional distillation techniques that assume the student is untrained.
Tweet media one
3
35
300
@vikhyatk
vik
3 months
as a red blooded, freedom loving american i know which model i'm going to use in my apps
Tweet media one
12
17
293
@vikhyatk
vik
2 months
ever felt an emptiness in the pit of your stomach? that could only go away if you had a dataset with 1.5M question/answer pairs about images? if so, i'm here to help.
Tweet media one
13
15
294
@vikhyatk
vik
4 months
getting a lot of DMs asking how to get into computer vision. i am no expert, i can only share what i did: 1. follow @giffmana 2. read all of his papers 3. watch recordings of all of his talks on youtube 4. study every tweet he posts for extra alpha
8
9
279
@vikhyatk
vik
11 months
@sergeykarayev The average politician age charts look pretty correlated except offset by ~10 years.
Tweet media one
0
8
267
@vikhyatk
vik
2 months
we just got moondream running with llama.cpp! so quantized/GGUF versions should be out early next week hopefully!
12
23
265
@vikhyatk
vik
1 month
> wake up > new 2.7B model, nice > wait it's actually 14B, 2.7B is "activated" but i still need all 14B in VRAM > benchmarks compare it to a 7B model > ??? what is the use-case for small/medium scale MoE models? why wouldn't you use a dense model instead? (serious question)
27
7
266
@vikhyatk
vik
4 days
i have developed a new architecture that beats transformers on language modeling. i'm not going to release code, weights, or even a demo. you'll just have to trust me i uploaded a PDF to arxiv
11
8
280
@vikhyatk
vik
3 months
@yifever i applied a while ago and they ghosted me, which is not cool but ok you get a lot of applications understandable. but then they DM’d me after I released moondream asking if I was interested and then ghosted me again… wtf??
4
0
249
@vikhyatk
vik
3 months
[on first date] her: so, what are you passionate about? me: i’m writing a 6,000 word essay on how MoE models are going through a hype cycle. they’re useful when serving at scale but open source research should focus on bringing back ReLU because — wait, where are you going?
10
11
249
@vikhyatk
vik
7 months
Tweet media one
4
12
248
@vikhyatk
vik
10 months
I’ve been going through programming subreddits lately (looking for places to shill my AI code review product), and am starting to realize the future is not evenly distributed when it comes to AI-assisted programming. Huge gap in the willingness to seriously try out new tools.
Tweet media one
Tweet media two
31
7
243
@vikhyatk
vik
19 days
can't believe this app is just $8/mo
Tweet media one
9
6
245
@vikhyatk
vik
1 month
New moondream release out today! Mainly focused on improved OCR and captioning. If you're using moondream for image captioning definitely worth checking this one out!
Tweet media one
14
15
245
@vikhyatk
vik
6 months
pleased to announce that while everything else was going on, i successfully upgraded CUDA drivers from 11.4 to 12.2 today
Tweet media one
12
4
241
@vikhyatk
vik
3 months
Some notes on LLaVA-1.6: 1/ To increase image resolution without retraining the vision encoder, they feed in five crops of the image. This improves performance, but comes with additional computational cost due to increased image tokens (from 576 to 2144).
Tweet media one
5
18
237
@vikhyatk
vik
2 months
Just released a new revision of moondream2! ✅ Improved benchmark scores and instruction following ✅ Batch inference ✅ Support Flash Attention 2.0 for the text model
Tweet media one
11
20
228
@vikhyatk
vik
5 months
Mistral's stated goal for this model (according to their pitch deck) was to beat ChatGPT 3.5 by a large margin.
Tweet media one
3
8
214
@vikhyatk
vik
2 months
seeing moondream trending on github is the only thing that brings me out of my seasonal affective disorder fugue. thank you all for the support! new improved version should be out later today!
Tweet media one
10
11
209
@vikhyatk
vik
2 months
moondream finetuning can run on a free colab notebook! try it out and show me your finetunes!
@vikhyatk
vik
2 months
ChatGPT refuses to solve CAPTCHA images, but luckily it's super easy to fine-tune moondream to do it. I just released a notebook showing how to do this.
Tweet media one
18
50
715
7
17
204
@vikhyatk
vik
1 month
working on a mamba mixture of experts diffusion qlora 1.58bit model trained using jax, rust, go, triton, dpo, and rag
24
5
204
@vikhyatk
vik
3 months
should i start charging $20/mo for the moondream space? looks like it's better than Gemini Ultra...
Tweet media one
@Suhail
Suhail
3 months
Well, at least they shipped I guess. (yes, this is the $20 Gemini)
Tweet media one
80
95
2K
9
11
198
@vikhyatk
vik
4 months
people who ask how a scrappy startup can win if a big company decides to compete with you understand nothing about startups and big companies, and are fundamentally unserious
6
6
194
@vikhyatk
vik
3 months
I hate the phrase “trivial to build” - it’s always said by someone who builds nothing. Building is hard. Building is expensive. Building is impossible. Anything that’s built is a miracle.
8
19
186
@vikhyatk
vik
3 months
my brain is broken. my first reaction to this was “that’s a nice loss curve” (in unrelated news, i am now a $GOOG shareholder)
Tweet media one
14
3
188
@vikhyatk
vik
2 months
the fact that VCs think it’s clever to ask for your secret sauce in the first call when you know they're invested in a competitor is really the core of the innovation economy
8
6
190
@vikhyatk
vik
7 months
@chinesegon skeptical that anyone can live off investment returns with just $2M. assuming 8% returns and ignoring inflation that's $160K/yr. doesn't even cover my doordash bill. :(
10
1
177
@vikhyatk
vik
2 months
i believe people are fundamentally good, and that AI tools should simply do what their users request instead of returning condescending responses about what's right or wrong
17
15
179
@vikhyatk
vik
11 months
@rishmishra shareholders saw these tiktoks and demanded immediate layoffs.
1
1
170
@vikhyatk
vik
2 months
just to clarify, moondream2 is actually open source. apache 2.0. no weird non-standard licensing terms. you can do whatever you want with it. it's probably already pre-approved by your company's legal department.
6
12
172
@vikhyatk
vik
2 months
any seattle friends interested in building this drone and seeing if we can get it to fly with just vision input, instead of the usual accelerometer/gyro PID controller? 🥺👉👈
Tweet media one
40
7
165
@vikhyatk
vik
3 months
i have not been at peace ever since i dug into some of the examples in the VQAv2 validation set
Tweet media one
7
9
162
@vikhyatk
vik
2 months
anthropic: “We believe that companies that train the best 2025/26 models will be too far ahead for anyone to catch up in subsequent cycles.” also anthropic:
Tweet media one
10
7
160
@vikhyatk
vik
21 days
damn
Tweet media one
8
1
156
@vikhyatk
vik
3 months
a lot of ai companies today are started because someone wanted to train a big model. building a successful business seems secondary in a lot of cases
8
3
157
@vikhyatk
vik
10 months
@kentcdodds no need for auth when you have no users
1
1
153
@vikhyatk
vik
2 months
digging into the grok tokenizer... looks like it has vocab size 131072 and splits digits into individual tokens
Tweet media one
6
8
157
@vikhyatk
vik
15 days
told my friend I was gonna spend the day replicating some arxiv paper and she said "what a rich and fulfilling life you lead"
11
2
154
@vikhyatk
vik
15 days
Why this works: for effective feature learning in neural networks using an Adam optimizer, learning rate needs to be inversely proportional to the width (a.k.a. model dimension) when your width is large. (screenshot from Tensor Programs V)
Tweet media one
@vikhyatk
vik
16 days
if you are using LoRA: divide the A matrix learning rate by 8 and multiply the B matrix learning rate by 8. you can thank me later
Tweet media one
12
44
445
3
19
153
@vikhyatk
vik
2 months
@inflectionAI did the user consent to this investigation? just curious what level of privacy i can expect when using Pi
15
4
149
@vikhyatk
vik
2 months
you’d think this is the exact scenario where one would want a local model instead of calling OpenAI. who wants their production line to grind to a halt because the factory’s internet connection was flaky?
@Figure_robot
Figure
2 months
With OpenAI, Figure 01 can now have full conversations with people -OpenAI models provide high-level visual and language intelligence -Figure neural networks deliver fast, low-level, dexterous robot actions Everything in this video is a neural network:
1K
5K
17K
15
4
148
@vikhyatk
vik
29 days
the rustaceans are trying to cancel me but i will not be silenced from speaking my truth!!
Tweet media one
@vikhyatk
vik
30 days
everyone complains about openai doing regulatory capture but apparently it's ok when rust does the same
5
2
55
6
4
144
@vikhyatk
vik
3 months
very little alpha in reading arxiv papers these days because the best insights are kept proprietary. luckily there's still tons of alpha in reading soviet papers from the 1970s
Tweet media one
5
5
142
@vikhyatk
vik
2 months
if your model costs $5-10B to train maybe you should consider writing some triton kernels or switching to jax.
@vkhosla
Vinod Khosla
2 months
Is open sourced @elonmusk ? A typical model in 2025 will cost $5-10b to train. Good business to open source it @pmarca ?
146
44
955
4
2
144
@vikhyatk
vik
4 months
/r/LocalLLaMA vibe check (the only benchmark I trust) looking good! ✅
Tweet media one
5
5
142
@vikhyatk
vik
2 months
never trust numbers in model names claims to be 1.3B parameters? may actually have anywhere from 1.4B to 1.9B parameters claims to take 384x384 images? the correct size is probably actually 378x378
8
6
142
@vikhyatk
vik
3 months
seeing people demo chatgpt wrappers at ai meetups i got rejected from is a bit of a blackpill ngl
10
0
140
@vikhyatk
vik
17 days
We can say right now, with a high degree of scientific certainty, moondream3 is going to be a lot smarter than moondream2 and moondream4 will be a lot smarter than moondream3, we are not near the top of this curve.
14
7
142
@vikhyatk
vik
9 months
Really enjoying working through this book (Principles of Deep Learning Theory). It’s hard work but the payoff is worth it.
Tweet media one
7
10
135
@vikhyatk
vik
1 month
me after trying out an H100 for the first time last night
Tweet media one
6
8
137
@vikhyatk
vik
2 months
is today launch day? or will i start another training run?
Tweet media one
12
5
134
@vikhyatk
vik
16 days
went to an ai meetup today, all the questions were like “what’s the best way to get the gradient from the loss to the weights?” “how do i increase my network’s capacity?” also saw @santiagomedr demo moondream running blazing fast on rust using @huggingface ’s candle library
Tweet media one
Tweet media two
6
7
135
@vikhyatk
vik
1 month
i would like to nominate a replacement
Tweet media one
@MikePFrank
Michael P. Frank is joining a startup!
2 months
Just FYI, computer vision papers submitted to IEEE that include this image of Ms. Forsén will no longer be considered for publication
Tweet media one
107
347
2K
3
5
135
@vikhyatk
vik
16 days
dario amodei wants me to delete this tweet because it discloses a compute multiplier, but i will not be silenced 😡 instead i will tell you that scaling by a factor of 4 instead of 8 will likely work even better
@vikhyatk
vik
16 days
if you are using LoRA: divide the A matrix learning rate by 8 and multiply the B matrix learning rate by 8. you can thank me later
Tweet media one
12
44
445
6
6
135
@vikhyatk
vik
15 days
fear not the person who has trained 10,000 models once, fear the person who has trained one model 10,000 times
4
8
134
@vikhyatk
vik
11 months
Tweet media one
1
1
131
@vikhyatk
vik
6 months
yes, the diffusion model training run is going well. why do you ask?
Tweet media one
12
1
145
@vikhyatk
vik
4 months
the open air GPU rig with 5 3090s in my living room is more powerful than the worlds most powerful supercomputer 20 years ago
9
2
128
@vikhyatk
vik
4 months
funny how gpt2 (1.5B parameters) was deemed too dangerous to release and today four years later i shipped a 1.6B param model that’s considered “tiny”
2
4
130
@vikhyatk
vik
21 days
> go to SF because you’re only allowed to work on AI if you’re in SF > RAG on the billboards > RAG at every AI meetup > someone broke into my car and left a flyer for their RAG company > pay an extra 20% in taxes for the privilege
@atroyn
anton (𝔴𝔞𝔯𝔱𝔦𝔪𝔢)
21 days
type of guy who thinks ai is going to change the world but won’t move to san francisco because he heard it’s dirty and dangerous
33
11
265
5
4
127
@vikhyatk
vik
3 months
STOP SCALING YOUR ATTENTION LOGITS BY 1/√D, IT'S BEEN ALMOST TWO YEARS SINCE μP SHOWED US THAT IT SHOULD BE SCALED INSTEAD BY 1/D
Tweet media one
5
3
128
@vikhyatk
vik
21 days
Visualizing the expressive power of different MLP activation functions... interesting how SiLU seems to converge faster than GELU.
5
3
127
@vikhyatk
vik
1 month
the xz vulnerability story is wild. they worked on the project for two years before injecting this attack. used sock puppets to pressure the previous maintainer into giving up control. who has the resources to pull something like this off? what other projects may be compromised?
7
7
125
@vikhyatk
vik
2 months
ceiling is being raised. cursor's copilot helped us write "superhuman code" for a critical feature. We can read this code, but VERY few engineers out there could write it from scratch.
Tweet media one
8
5
123
@vikhyatk
vik
12 days
Your Vision Language Model is Secretly a Bounding Box Predictor
@jheitzeb
Joe Heitzeberg
12 days
Moondream now has bounding boxes! @vikhyatk has created a vision language model that is both powerful and efficient. AI Tinkerers SF (running locally on laptop)
1
4
42
3
6
125
@vikhyatk
vik
4 months
if she doesn’t love you when you’re an lstm she doesn’t deserve you when you become a transformer
6
13
120
@vikhyatk
vik
3 months
i am no longer eligible for the forbes 30 under 30 list, but it’s okay because my mom said she doesn’t want me on that list anyway
23
1
119