vik @vikhyatk profile

vik

@vikhyatk

Followers

7,371

Following

530

Media

664

Statuses

3,861

teaching computers how to see // prev: @awscloud

https://t.co/nps6PHOgrJ

Seattle

Joined November 2008

Don't wanna be here? Send us removal request.

Explore tweets Explore followers Explore following

Explore trending content on Musk Viewer

#母の日 • 337969 Tweets

#นาฏราชครั้งที่15 • 309964 Tweets

WIN AT NATARAJA AWARDS • 119280 Tweets

Feliz Dia • 117976 Tweets

テンハッピーローズ • 109104 Tweets

Mães • 84346 Tweets

#光る君へ • 71957 Tweets

カーネーション • 54573 Tweets

MC NUNEW EP1 • 49618 Tweets

Cibeles • 44500 Tweets

#やまラスト • 37877 Tweets

新ビジュ • 35582 Tweets

DONBELLE BOX OFFICE LEGACY • 31905 Tweets

Ohm X Nataraja Awards • 17094 Tweets

#光る君ヘ • 16011 Tweets

GENCelebrate Music With BINI • 12262 Tweets

清少納言 • 10366 Tweets

ききょうさん

Abdulkadir Uraloğlu

オークス

木澤くん

前進守備

マサムネ

シャイン

一条天皇

為時パパ

思い出ファースト

道長くん

Norwich

まひろちゃん

マイキー

ダーティペア

御堂関白

申告敬遠

初恋ドア

長岡くん

Rutter

KAmiYU

キンタロー

隻眼の梟

フカヒレさん

いっこく堂さん

長徳の変

キヨ結婚

岩田くん

サンタナ

大橋くん

第10回

#ファミコン全国一斉クイズ

すわほー

Last Seen Profiles

@zi6295

@RQ_III

@sun_shine_1980

@2LD32

@xgocrazyx10

@also_fine

@BuckyBreathes

@MargauxEwen

@izmir_cd_pasif

@bar_peleg

@VelvetWoofers

@irene_shimira

@TAGHeuer

@supatufpinkpuf

@centredevils

@AnHaiPhung

@penembakngasal

@pincheallantv

@KONTENREQUEST

@isaacswoodward

Pinned Tweet

vik

@vikhyatk

17 days

this is a real, non-cherry-picked sample from my lnqa visual question answering dataset

4

0

55

vik

@vikhyatk

3 months

wait so car keys don’t use asymmetric cryptography? you can unlock a car by just replaying the RF signals the key emits? my $20 raspberry pi zero has better security than my $20k car?

143

177

4K

vik

@vikhyatk

7 months

> doesn't say please and thank you to ChatGPT 🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩

89

268

2K

vik

@vikhyatk

2 months

Releasing moondream2 - a small, open-source, vision language model designed to run efficiently on edge devices. Clocking in at 1.8B parameters, moondream requires less than 5GB of memory to run in 16 bit precision.

75

210

2K

vik

@vikhyatk

3 months

“you should perhaps consider joining us” while also having the most dysfunctional recruiting organization i’ve had the misfortune of interacting with

Sam Altman

@sama

3 months

openai is the most talented and nicest group of people i have ever seen in one place working on the hardest, most interesting, and most important problems with all the key resources in place extremely focused on making AGI you should perhaps considering joining us

2K

1K

23K

15

13

1K

vik

@vikhyatk

4 months

when she realizes your app runs a local vision language model instead of calling the GPT-4V api

16

40

963

vik

@vikhyatk

4 months

Announcing moondream1: a tiny 1.6B parameter vision language model that punches above its weight

31

121

919

vik

@vikhyatk

6 months

work has been interesting lately got dinged for scheduling an all-hands meeting because the phrase “all-hands” is ableist (not a joke, DM for proof) then my GPU instance order was rejected because there’s no capacity (my job is training ML models)

50

18

884

vik

@vikhyatk

2 months

the fifth law of thermodynamics states that mark zuckerberg always wins.

15

48

884

vik

@vikhyatk

3 months

friday was my last day at AWS. I had a great 9 years and learned a lot but I’m excited to join the rest of society in complaining about AWS instead of defending it

27

16

842

vik

@vikhyatk

2 months

1,857,482,608 floating point numbers. 3.72 GB. Training this model cost more than my house. Excited to share it with everyone next week. :)

21

31

819

vik

@vikhyatk

5 months

looking into buying hardware so i can run mixtral locally and cancel my $20/mo ChatGPT subscription. looks like i can expect to break even in 83 years

16

24

752

vik

@vikhyatk

7 months

@PicturesFoIder ask if he wants to pet my dog

23

4

742

vik

@vikhyatk

4 months

it's time to build * * an analytics pipeline for an internal tool that will be deprecated in six months at your dystopian corporate job

9

44

727

vik

@vikhyatk

2 months

ChatGPT refuses to solve CAPTCHA images, but luckily it's super easy to fine-tune moondream to do it. I just released a notebook showing how to do this.

18

50

715

vik

@vikhyatk

10 months

@ceadreams @latkedelrey i'll take 300 wishes so there's a 500% chance

6

1

681

vik

@vikhyatk

5 months

@notmybagman sorry but i will not be taking any complaints about ChatGPT's cooling water usage while we're still subsidizing cotton farming in the Arizona desert

17

5

591

vik

@vikhyatk

2 months

me explaining to my parents that I quit my sr management faang job to train open source models and shitpost on the internet

6

19

555

vik

@vikhyatk

12 days

there is a company out there that spent $1.4B training a model you’ve never heard of because it was so bad. they had 16 people working on just the tokenizer.

18

21

545

vik

@vikhyatk

5 months

Implemented inference for the Mixtral 8x7B model. Requires ~100GB of VRAM, so you can definitely run it on an 8x3090 or 8x4090 instance. (GitHub link in thread)

15

29

449

vik

@vikhyatk

16 days

if you are using LoRA: divide the A matrix learning rate by 8 and multiply the B matrix learning rate by 8. you can thank me later

12

44

445

vik

@vikhyatk

19 days

i am going to make a GUI that looks like this to monitor my data processing pipelines

46

12

419

vik

@vikhyatk

3 months

@glennchrpntr a problem for when i make enough money to have a garage...

1

422

vik

@vikhyatk

4 months

Created a Huggingface Space for moondream1 to make it easier to try out!

33

55

374

vik

@vikhyatk

4 months

Releasing moondream0 today - a small vision language model based on SigLIP, Phi-1.5 and the LLaVa training dataset. This demo shows the model running purely on CPU using ~8GB of RAM.

19

55

372

vik

@vikhyatk

2 months

can someone who is good with money help me balance my budget? i am currently funemployed and need to bring my burn rate down. Rent ($1,800/mo) - $21,600 8xA100-40GB ($11/hr) - $96,624 Food ($600/mo) - $7,200 Annual Total - $125,424

45

10

362

vik

@vikhyatk

6 months

it is with a heavy heart that i’m announcing shutting down all of my AI projects. will be focusing exclusively on linear algebra and stochastic differential equations going forward.

10

17

354

vik

@vikhyatk

3 months

@realfastman and now you have to leave your car unlocked so they don’t break the windows, so i guess it still doesn’t matter

5

0

346

vik

@vikhyatk

11 months

@realism_fan it was cotton eyed joe wasn't it? where did he come from? where did he go?

2

334

vik

@vikhyatk

3 months

moondream1 inference is now available in @huggingface transformers!

10

36

325

vik

@vikhyatk

2 months

the model is learning

11

10

314

vik

@vikhyatk

2 months

They raised $1.3B at a $4B valuation less than a year ago.

13

11

305

vik

@vikhyatk

3 months

mouser has ~1500 of the coral micro dev boards in stock now - comes with a camera, mic, and edge TPU. i got one for… uh, no reason in particular 😉

20

8

299

vik

@vikhyatk

7 months

Cool paper - shows how to transfer knowledge from a teacher model to a student that is already pre-trained and may even outperform the teacher without loss of performance. Overcomes shortcomings of traditional distillation techniques that assume the student is untrained.

3

35

300

vik

@vikhyatk

3 months

as a red blooded, freedom loving american i know which model i'm going to use in my apps

12

17

293

vik

@vikhyatk

2 months

ever felt an emptiness in the pit of your stomach? that could only go away if you had a dataset with 1.5M question/answer pairs about images? if so, i'm here to help.

13

15

294

vik

@vikhyatk

4 months

getting a lot of DMs asking how to get into computer vision. i am no expert, i can only share what i did: 1. follow @giffmana 2. read all of his papers 3. watch recordings of all of his talks on youtube 4. study every tweet he posts for extra alpha

8

9

279

vik

@vikhyatk

11 months

@sergeykarayev The average politician age charts look pretty correlated except offset by ~10 years.

0

8

267

vik

@vikhyatk

2 months

we just got moondream running with llama.cpp! so quantized/GGUF versions should be out early next week hopefully!

12

23

265

vik

@vikhyatk

1 month

> wake up > new 2.7B model, nice > wait it's actually 14B, 2.7B is "activated" but i still need all 14B in VRAM > benchmarks compare it to a 7B model > ??? what is the use-case for small/medium scale MoE models? why wouldn't you use a dense model instead? (serious question)

27

7

266

vik

@vikhyatk

4 days

i have developed a new architecture that beats transformers on language modeling. i'm not going to release code, weights, or even a demo. you'll just have to trust me i uploaded a PDF to arxiv

11

8

280

vik

@vikhyatk

3 months

@yifever i applied a while ago and they ghosted me, which is not cool but ok you get a lot of applications understandable. but then they DM’d me after I released moondream asking if I was interested and then ghosted me again… wtf??

4

0

249

vik

@vikhyatk

3 months

[on first date] her: so, what are you passionate about? me: i’m writing a 6,000 word essay on how MoE models are going through a hype cycle. they’re useful when serving at scale but open source research should focus on bringing back ReLU because — wait, where are you going?

10

11

249

vik

@vikhyatk

7 months

4

12

248

vik

@vikhyatk

10 months

I’ve been going through programming subreddits lately (looking for places to shill my AI code review product), and am starting to realize the future is not evenly distributed when it comes to AI-assisted programming. Huge gap in the willingness to seriously try out new tools.

31

7

243

vik

@vikhyatk

19 days

can't believe this app is just $8/mo

9

6

245

vik

@vikhyatk

1 month

New moondream release out today! Mainly focused on improved OCR and captioning. If you're using moondream for image captioning definitely worth checking this one out!

14

15

245

vik

@vikhyatk

6 months

pleased to announce that while everything else was going on, i successfully upgraded CUDA drivers from 11.4 to 12.2 today

12

4

241

vik

@vikhyatk

3 months

Some notes on LLaVA-1.6: 1/ To increase image resolution without retraining the vision encoder, they feed in five crops of the image. This improves performance, but comes with additional computational cost due to increased image tokens (from 576 to 2144).

5

18

237

vik

@vikhyatk

2 months

Just released a new revision of moondream2! ✅ Improved benchmark scores and instruction following ✅ Batch inference ✅ Support Flash Attention 2.0 for the text model

11

20

228

vik

@vikhyatk

5 months

Mistral's stated goal for this model (according to their pitch deck) was to beat ChatGPT 3.5 by a large margin.

3

8

214

vik

@vikhyatk

2 months

seeing moondream trending on github is the only thing that brings me out of my seasonal affective disorder fugue. thank you all for the support! new improved version should be out later today!

10

11

209

vik

@vikhyatk

2 months

moondream finetuning can run on a free colab notebook! try it out and show me your finetunes!

Finetuning moondream

Colaboratory notebook

colab.research.google.com

vik

@vikhyatk

2 months

ChatGPT refuses to solve CAPTCHA images, but luckily it's super easy to fine-tune moondream to do it. I just released a notebook showing how to do this.

18

50

715

7

17

204

vik

@vikhyatk

1 month

working on a mamba mixture of experts diffusion qlora 1.58bit model trained using jax, rust, go, triton, dpo, and rag

24

5

204

vik

@vikhyatk

3 months

should i start charging $20/mo for the moondream space? looks like it's better than Gemini Ultra...

Suhail

@Suhail

3 months

Well, at least they shipped I guess. (yes, this is the $20 Gemini)

80

95

2K

9

11

198

vik

@vikhyatk

4 months

people who ask how a scrappy startup can win if a big company decides to compete with you understand nothing about startups and big companies, and are fundamentally unserious

6

194

vik

@vikhyatk

3 months

I hate the phrase “trivial to build” - it’s always said by someone who builds nothing. Building is hard. Building is expensive. Building is impossible. Anything that’s built is a miracle.

8

19

186

vik

@vikhyatk

3 months

my brain is broken. my first reaction to this was “that’s a nice loss curve” (in unrelated news, i am now a $GOOG shareholder)

14

3

188

vik

@vikhyatk

2 months

the fact that VCs think it’s clever to ask for your secret sauce in the first call when you know they're invested in a competitor is really the core of the innovation economy

8

6

190

vik

@vikhyatk

7 months

@chinesegon skeptical that anyone can live off investment returns with just $2M. assuming 8% returns and ignoring inflation that's $160K/yr. doesn't even cover my doordash bill. :(

10

1

177

vik

@vikhyatk

3 months

it’s a good thing i can use moondream to build a fully local AI surveillance system that can notify me if someone tries to steal my car

100% Local AI Surveillance System - Moondream 1.6B / Python / Mistral...

100% Local AI Surveillance System - Moondream 1.6B / Python / Mistral 7B👊 Become a member and get access to GitHub:https://www.youtube.com/c/AllAboutAI/join...

www.youtube.com

7

4

179

vik

@vikhyatk

2 months

i believe people are fundamentally good, and that AI tools should simply do what their users request instead of returning condescending responses about what's right or wrong

17

15

179

vik

@vikhyatk

11 months

@rishmishra shareholders saw these tiktoks and demanded immediate layoffs.

1

170

vik

@vikhyatk

2 months

just to clarify, moondream2 is actually open source. apache 2.0. no weird non-standard licensing terms. you can do whatever you want with it. it's probably already pre-approved by your company's legal department.

6

12

172

vik

@vikhyatk

2 months

any seattle friends interested in building this drone and seeing if we can get it to fly with just vision input, instead of the usual accelerometer/gyro PID controller? 🥺👉👈

40

7

165

vik

@vikhyatk

3 months

i have not been at peace ever since i dug into some of the examples in the VQAv2 validation set

7

9

162

vik

@vikhyatk

2 months

anthropic: “We believe that companies that train the best 2025/26 models will be too far ahead for anyone to catch up in subsequent cycles.” also anthropic:

10

7

160

vik

@vikhyatk

21 days

damn

8

1

156

vik

@vikhyatk

3 months

a lot of ai companies today are started because someone wanted to train a big model. building a successful business seems secondary in a lot of cases

8

3

157

vik

@vikhyatk

10 months

@kentcdodds no need for auth when you have no users

1

153

vik

@vikhyatk

2 months

digging into the grok tokenizer... looks like it has vocab size 131072 and splits digits into individual tokens

6

8

157

vik

@vikhyatk

15 days

told my friend I was gonna spend the day replicating some arxiv paper and she said "what a rich and fulfilling life you lead"

11

2

154

vik

@vikhyatk

15 days

Why this works: for effective feature learning in neural networks using an Adam optimizer, learning rate needs to be inversely proportional to the width (a.k.a. model dimension) when your width is large. (screenshot from Tensor Programs V)

vik

@vikhyatk

16 days

if you are using LoRA: divide the A matrix learning rate by 8 and multiply the B matrix learning rate by 8. you can thank me later

12

44

445

3

19

153

vik

@vikhyatk

2 months

@inflectionAI did the user consent to this investigation? just curious what level of privacy i can expect when using Pi

15

4

149

vik

@vikhyatk

2 months

you’d think this is the exact scenario where one would want a local model instead of calling OpenAI. who wants their production line to grind to a halt because the factory’s internet connection was flaky?

Figure

@Figure_robot

2 months

With OpenAI, Figure 01 can now have full conversations with people -OpenAI models provide high-level visual and language intelligence -Figure neural networks deliver fast, low-level, dexterous robot actions Everything in this video is a neural network:

1K

5K

17K

15

4

148

vik

@vikhyatk

29 days

the rustaceans are trying to cancel me but i will not be silenced from speaking my truth!!

vik

@vikhyatk

30 days

everyone complains about openai doing regulatory capture but apparently it's ok when rust does the same

5

2

55

6

4

144

vik

@vikhyatk

3 months

very little alpha in reading arxiv papers these days because the best insights are kept proprietary. luckily there's still tons of alpha in reading soviet papers from the 1970s

5

142

vik

@vikhyatk

2 months

if your model costs $5-10B to train maybe you should consider writing some triton kernels or switching to jax.

Vinod Khosla

@vkhosla

2 months

Is open sourced @elonmusk ? A typical model in 2025 will cost $5-10b to train. Good business to open source it @pmarca ?

146

44

955

4

2

144

vik

@vikhyatk

4 months

/r/LocalLLaMA vibe check (the only benchmark I trust) looking good! ✅

5

142

vik

@vikhyatk

2 months

never trust numbers in model names claims to be 1.3B parameters? may actually have anywhere from 1.4B to 1.9B parameters claims to take 384x384 images? the correct size is probably actually 378x378

8

6

142

vik

@vikhyatk

3 months

seeing people demo chatgpt wrappers at ai meetups i got rejected from is a bit of a blackpill ngl

10

0

140

vik

@vikhyatk

17 days

We can say right now, with a high degree of scientific certainty, moondream3 is going to be a lot smarter than moondream2 and moondream4 will be a lot smarter than moondream3, we are not near the top of this curve.

14

7

142

vik

@vikhyatk

9 months

Really enjoying working through this book (Principles of Deep Learning Theory). It’s hard work but the payoff is worth it.

7

10

135

vik

@vikhyatk

1 month

me after trying out an H100 for the first time last night

6

8

137

vik

@vikhyatk

2 months

is today launch day? or will i start another training run?

12

5

134

vik

@vikhyatk

16 days

went to an ai meetup today, all the questions were like “what’s the best way to get the gradient from the loss to the weights?” “how do i increase my network’s capacity?” also saw @santiagomedr demo moondream running blazing fast on rust using @huggingface ’s candle library

6

7

135

vik

@vikhyatk

1 month

i would like to nominate a replacement

Michael P. Frank is joining a startup!

@MikePFrank

2 months

Just FYI, computer vision papers submitted to IEEE that include this image of Ms. Forsén will no longer be considered for publication

107

347

2K

3

5

135

vik

@vikhyatk

16 days

dario amodei wants me to delete this tweet because it discloses a compute multiplier, but i will not be silenced 😡 instead i will tell you that scaling by a factor of 4 instead of 8 will likely work even better

vik

@vikhyatk

16 days

if you are using LoRA: divide the A matrix learning rate by 8 and multiply the B matrix learning rate by 8. you can thank me later

12

44

445

6

135

vik

@vikhyatk

15 days

fear not the person who has trained 10,000 models once, fear the person who has trained one model 10,000 times

4

8

134

vik

@vikhyatk

11 months

@a_musingcat 4x4

1

131

vik

@vikhyatk

6 months

yes, the diffusion model training run is going well. why do you ask?

12

1

145

vik

@vikhyatk

4 months

the open air GPU rig with 5 3090s in my living room is more powerful than the worlds most powerful supercomputer 20 years ago

9

2

128

vik

@vikhyatk

4 months

funny how gpt2 (1.5B parameters) was deemed too dangerous to release and today four years later i shipped a 1.6B param model that’s considered “tiny”

2

4

130

vik

@vikhyatk

21 days

> go to SF because you’re only allowed to work on AI if you’re in SF > RAG on the billboards > RAG at every AI meetup > someone broke into my car and left a flyer for their RAG company > pay an extra 20% in taxes for the privilege

anton (𝔴𝔞𝔯𝔱𝔦𝔪𝔢)

@atroyn

21 days

type of guy who thinks ai is going to change the world but won’t move to san francisco because he heard it’s dirty and dangerous

33

11

265

5

4

127

vik

@vikhyatk

3 months

STOP SCALING YOUR ATTENTION LOGITS BY 1/√D, IT'S BEEN ALMOST TWO YEARS SINCE μP SHOWED US THAT IT SHOULD BE SCALED INSTEAD BY 1/D

5

3

128

vik

@vikhyatk

21 days

Visualizing the expressive power of different MLP activation functions... interesting how SiLU seems to converge faster than GELU.

5

3

127

vik

@vikhyatk

1 month

the xz vulnerability story is wild. they worked on the project for two years before injecting this attack. used sock puppets to pressure the previous maintainer into giving up control. who has the resources to pull something like this off? what other projects may be compromised?

7

125

vik

@vikhyatk

2 months

ceiling is being raised. cursor's copilot helped us write "superhuman code" for a critical feature. We can read this code, but VERY few engineers out there could write it from scratch.

8

5

123

vik

@vikhyatk

12 days

Your Vision Language Model is Secretly a Bounding Box Predictor

Joe Heitzeberg

@jheitzeb

12 days

Moondream now has bounding boxes! @vikhyatk has created a vision language model that is both powerful and efficient. AI Tinkerers SF (running locally on laptop)

1

4

42

3

6

125

vik

@vikhyatk

4 months

if she doesn’t love you when you’re an lstm she doesn’t deserve you when you become a transformer

6

13

120

vik

@vikhyatk

3 months

i am no longer eligible for the forbes 30 under 30 list, but it’s okay because my mom said she doesn’t want me on that list anyway

23

1

119