Aydyn Tairov Profile Banner
Aydyn Tairov Profile
Aydyn Tairov

@tairov

Followers
1,004
Following
247
Media
130
Statuses
1,020

Production Engineer / ex-Meta

London
Joined March 2009
Don't wanna be here? Send us removal request.
Explore trending content on Musk Viewer
@tairov
Aydyn Tairov
9 months
I was really excited that Mojo became publicly available and thinking which project can I implement to learn Mojo concepts. Since I have already ported llama2.c to pure Python, I decided why not try to port to Mojo now 😀 And here is what I got...
10
66
573
@tairov
Aydyn Tairov
7 months
The out-of-the-box features of @Modular_AI 's Mojo are just incredible. We applied unrolling and now llama2.🔥 outperforms @ggerganov 's llama.cpp by almost 20% in CPU inference speed.
Tweet media one
11
49
364
@tairov
Aydyn Tairov
6 months
BREAKING: I've implemented a prototype of a cutting-edge Q-Learning algorithm on Mojo 🔥, and now it's working 35,000x times faster than any existing (!) implementations. Thanks to Mojo's incredible feature that allows transparently import any Python modules! I really hope this
Tweet media one
14
32
239
@tairov
Aydyn Tairov
9 months
llama2 inference in a pure Mojo 🔥 I found the SIMD Mojo primitives really interesting feature, since it helped to improve pretty awful performance of Python solution almost 250x times.
2
32
183
@tairov
Aydyn Tairov
8 months
Llama2.mojo performance on Mac is right up there with llama.cpp (!!!), and even outperforms plain C in many cases. This is insane!
Tweet media one
Tweet media two
Tweet media three
8
22
148
@tairov
Aydyn Tairov
7 months
@tldraw INSANE !
5
13
122
@tairov
Aydyn Tairov
9 months
5
10
111
@tairov
Aydyn Tairov
9 months
Internally I used vectorisation helpers for matmul so that now Mojo solution can beat original llama2.c by @karpathy by 20% I think there is still some room for further improvements.
3
3
63
@tairov
Aydyn Tairov
8 months
I've got early access to the Mojo SDK 🔥 for Mac from @Modular_AI . And of course I always wanted to inference baby llama on pure Mojo on Apple Silicon.. True story not only on Mojo 😉 So far, results are mind-blowing! Here are some benchmarks
6
3
38
@tairov
Aydyn Tairov
7 months
Another milestone! We're baselining Mojo with Mojo 😉
Tweet media one
@Modular
Modular
7 months
Mojo 🔥 0.5.0 is released! 🚀 Even more epic updates unleashed! 😱 Checkout the highlights ⬇️ or read the full changelog here & happy weekend hacking! 👩🏼‍💻 ➡️
Tweet media one
5
34
211
1
2
36
@tairov
Aydyn Tairov
7 months
GitHub's integration of GPU enabled M1 Apple Silicon hosts for action runners may have flown under the radar, but its implications are vast. It's a strong indicator of Apple Silicon's rising adoption among developers, hinting at a future where M2 and M3 become central to the
Tweet media one
1
3
24
@tairov
Aydyn Tairov
8 months
Thanks @Modular_AI , 🔥 finally my open source contributions materialized in the form of nice color merch T-shirt 😎 👕
Tweet media one
Tweet media two
1
2
24
@tairov
Aydyn Tairov
9 months
Wow! This is exciting! Thanks @Modular_AI for appreciating my efforts & congrats on the public release of Mojo! Seems that my port of llama2 inference is truly a "First prober ai written in Mojo" 😀
@Modular
Modular
9 months
The first crack at llama2.🔥 is here 🚀 A Mojo 🔥 community member - Mojician - did a simple port from Python to Mojo, and shows its already 20% faster than Karpathys llama.c implementation 😱 How much faster can it go? 📈
27
191
1K
0
1
21
@tairov
Aydyn Tairov
7 months
@Modular_AI Nice work! llama2.🔥 bumped Mojo version to 0.5.0, and we got few % boost.
Tweet media one
0
0
20
@tairov
Aydyn Tairov
9 months
@andrew_n_carr Oh, man! I've got so much fun. I'm encouraging you to finish & share it anyway. I'd love to check it out!
0
0
20
@tairov
Aydyn Tairov
7 months
@Modular_AI Exciting! I got early access to the Mojo SDK (Mac) week ago, and compared it's performance on baby-llama inference. Mojo VS Rust, C, Cpp, Go, Zig, and Julia. In total 12 implementations on 7 languages x 3 model x 30 rounds Check this out
2
4
21
@tairov
Aydyn Tairov
7 months
My shock in the shock! Appreciate @lexfridman highlighting our efforts – exciting to receive such a feedback!
@lexfridman
Lex Fridman
7 months
@clattner_llvm Wow, impressive!
4
1
38
1
2
18
@tairov
Aydyn Tairov
7 months
Another moment of fame, now on livestream with awesome @Modular_AI team 😎! Really exciting week!
Tweet media one
1
3
16
@tairov
Aydyn Tairov
8 months
Baking something cool.. Now on Mac 👨‍💻
1
2
15
@tairov
Aydyn Tairov
8 months
I have the honor of authoring the first ever guest-post on the Modular AI blog, about my journey with porting #llama2 inference into the #mojo lang Kudos to @shshnkp for the incredible cooperation and support with preparing this article!
@Modular
Modular
8 months
New blog post by Mojician 🔥 and guest contributor @tairov 💯⬇️ Aydyn discusses his journey from discovering Mojo 🔥 to implementing llama2.🔥 which has over 1.2k stars 🤩 on GitHub! 🚀
1
8
72
1
1
15
@tairov
Aydyn Tairov
8 months
It'll disrupt not just AI/ML development industry, but much more.. It's a huuge story
1
0
13
@tairov
Aydyn Tairov
9 months
Thanks to PR from Modular team member, parallelize works in llama2.🔥 Why not compare parallel execution with llama2.c? And... llama2.c strikes back, now with OMP...
1
1
12
@tairov
Aydyn Tairov
6 months
👀 "MLX has a Python API which closely follows NumPy. MLX also has a fully featured C++ API which closely mirrors the Python API" Yet another attempt to fix ML models usability by implementing Python libs written on C++. Now from Apple research
@awnihannun
Awni Hannun
6 months
Just in time for the holidays, we are releasing some new software today from Apple machine learning research. MLX is an efficient machine learning framework specifically designed for Apple silicon (i.e. your laptop!) Code: Docs:
100
711
4K
3
0
12
@tairov
Aydyn Tairov
7 months
From slowest to fastest — my Python and Mojo 🔥 ports of @karpathy llama2.c interestingly went to opposite ends of the perf spectrum. Recently I got a PR for #python using pypy & codon compilation on llama2-py that boosted it ~50x!
2
0
11
@tairov
Aydyn Tairov
8 months
Repo got 1K stars. When you truly understand you have made a meaningful contribution to the #mojo community when Github offers to nominee a successor
Tweet media one
Tweet media two
0
0
10
@tairov
Aydyn Tairov
7 months
whisper.cpp now supports HF distill-whisper + full CUDA & Apple Metal offloading, that brings almost a 4x boost to transcribing on fp model
Tweet media one
@ggerganov
Georgi Gerganov
7 months
whisper.cpp v1.5.0
15
74
603
0
1
11
@tairov
Aydyn Tairov
6 months
Despite my best efforts to attend #ModCon onsite this year, I sadly couldn't make it happen. But the event can't truly go on without the first ever Mojician v-attendance! 😅 Wishing everyone an insightful conference!
@Muhtasham9
muhtasham
6 months
Tweet media one
0
0
1
0
1
10
@tairov
Aydyn Tairov
7 months
Seems that @samlakig took this challenge very seriously. I'm eager to see the details of what you can accomplish!
@samlakig
sam laki (e^-λ)
7 months
I think I'll need to output more debug symbols/ do you think better to read the report straight from perf
Tweet media one
1
0
17
1
1
10
@tairov
Aydyn Tairov
8 months
I tried to optimize all the #llama2c ports for max performance. Some don't support multithreading so the comparisons aren't completely apples-to-apples. But it's clear Mojo is here to stay
1
0
9
@tairov
Aydyn Tairov
6 months
It's turned out that Mistral's team literally is not using Mojo to speedup training and inference 35,000x times and raised €120+M Investors what are you doing 😢 this is horrendous!
@jeremyphoward
Jeremy Howard
6 months
Mistral's team literally just used learned weights over data instead of programmed rules and raised 120+M. Investors what are you doing 😢 this is horrendous
20
31
722
2
2
10
@tairov
Aydyn Tairov
8 months
I secured early access to the Mojo SDK on Mac before general release. Put all #llama2c ports through extensive benchmarks across 7 languages and 12 variations. Crafted custom benchmarking framework to test performance. Quite intriguing battle on M1 Mac - results are telling 👇
1
0
9
@tairov
Aydyn Tairov
7 months
Interestingly , within just 2 weeks , we are not using anymore C implementation as a baseline. I’m sure there are still discoveries to be made 💥
0
1
9
@tairov
Aydyn Tairov
8 months
I built a test env to benchmark all @karpathy 's #llama2c ports, including Mojo, Zig, Julia, Rust & #llamacpp converter by @ggerganov . Ran inference across 3 baby llama models in 30 rounds (multi/single threaded). Check out the full report
1
0
8
@tairov
Aydyn Tairov
6 months
It seems that the Mistral team is setting a new trend in OS LLMs. MoE — mixture of experts . If I understand correctly, GPT-4's impressive performance largely stems from a similar technique. In the backend, it features eight 'heads' or 'experts', each to be 250 billion parameter
@karpathy
Andrej Karpathy
6 months
New open weights LLM from @MistralAI params.json: - hidden_dim / dim = 14336/4096 => 3.5X MLP expand - n_heads / n_kv_heads = 32/8 => 4X multiquery - "moe" => mixture of experts 8X top 2 👀 Likely related code: Oddly absent: an over-rehearsed
Tweet media one
90
608
5K
0
0
8
@tairov
Aydyn Tairov
6 months
Next move by 4D chess grandmaster?
Tweet media one
2
0
8
@tairov
Aydyn Tairov
9 months
@aniketvartak @Modular_AI @karpathy The thing is llama2.mojo is also was implemented for understanding Mojo concepts and to have a real-world example. However that doesn't mean both projects couldn't be evolved further to squeeze everything you can out of the hardware while trying to maintain brevity
1
0
8
@tairov
Aydyn Tairov
7 months
Kudos to the @ziglang community for improving and benchmarking llama2 inference in Zig under Apple M1/2 ! The llama2.zig implementation has solid single-threaded performance - it may be the fastest single-threaded inference of tiny-llama models so far on Macs. Surprisingly, no
Tweet media one
@tairov
Aydyn Tairov
8 months
Llama2.mojo performance on Mac is right up there with llama.cpp (!!!), and even outperforms plain C in many cases. This is insane!
Tweet media one
Tweet media two
Tweet media three
8
22
148
1
1
8
@tairov
Aydyn Tairov
7 months
@justthisguy @Modular_AI @ggerganov llama2.🔥 is a port of @karpathy 's llama2.c, biggest supported model so far is TinyLlama with 1.1B parameters. I hope soon we can run bigger quantized models as well
1
0
6
@tairov
Aydyn Tairov
6 months
This is Wow
@BrendanBycroft
Brendan Bycroft
6 months
Project #2 : LLM Visualization So I created a web-page to visualize a small LLM, of the sort that's behind ChatGPT. Rendered in 3D, it shows all the steps to run a single token inference. (link in bio)
114
1K
6K
0
0
7
@tairov
Aydyn Tairov
9 months
@clattner_llvm This is what happens when you launch a new exciting technology that many enthusiasts have been eager to try out right before the weekend 👨‍💻
0
0
7
@tairov
Aydyn Tairov
6 months
The Mojo truck is unstoppable! It now outperforms the Porsche 911 ! I think this could be a great trailerbuster for ModCon'23 😎
0
0
7
@tairov
Aydyn Tairov
7 months
In case of anyone still doubt the importance of diving into learning and entering the AI field now. 👇
@karpathy
Andrej Karpathy
7 months
@keijikiriya_ @chrisalbon Are you kidding? There has never been a green pasture of this size with this low barrier to entry.
22
109
1K
0
1
7
@tairov
Aydyn Tairov
6 months
@Modular_AI I am particularly pleased that my project llama2.🔥 was also mentioned as an example of what can be achieved with inbuilt Mojo features.
Tweet media one
2
2
7
@tairov
Aydyn Tairov
8 months
I think it's worthwhile to share, some interim results we got with llama2.🔥 speedup. Our incredible github contributors baked a draft PR. And here is what we have 👇
1
0
7
@tairov
Aydyn Tairov
8 months
I'll double check all results and will prepare a write-up with full details.
1
0
7
@tairov
Aydyn Tairov
8 months
#GPT4V can understand the essence of recursion even from a photo. Unbelievable!
Tweet media one
3
1
6
@tairov
Aydyn Tairov
6 months
@rasbt If anyone have it coded, I'm eager to benchmark with my Mojo "reference" implementation
0
0
5
@tairov
Aydyn Tairov
6 months
@lexfridman CEO/Board misalignment
1
0
5
@tairov
Aydyn Tairov
7 months
Hope the next year's Google Presentation be like this ( @Modular_AI ? )
@Modular
Modular
7 months
🦙 .🔥
Tweet media one
4
10
190
1
0
6
@tairov
Aydyn Tairov
9 months
It shows additional details which loops were vectorized by gcc compiler So far it seems that the very first comparison llama2.c vs Mojo was fair gcc is aggressively vectorizing all loops it can find 😀
Tweet media one
0
2
6
@tairov
Aydyn Tairov
6 months
@elonmusk Thanks for sharing! I have it already implemented on Mojo 🔥, so now it's 35,000x times faster than any other implementations
1
2
6
@tairov
Aydyn Tairov
5 months
Who remember this? From 2008, the Google Search Appliance led the way in on-prem search solutions for enterprises deployed as a rack form factor black box device. It was discontinued post-2018
Tweet media one
1
0
6
@tairov
Aydyn Tairov
7 months
Is it only me or OpenAI killed N+ startups I was going to develop ? 🤓 #OpenAIDevDay
2
0
6
@tairov
Aydyn Tairov
7 months
I discovered podcasts on X. Today I was invited to a really nice one! Thanks @altryne for the opportunity to share highlights & my experience with early release of Mojo SDK on Mac from @Modular_AI . PS. My moment of fame starts at the 59th min 🔊
@altryne
Alex Volkov (Thursd/AI)
7 months
T-minus 2 hours for @thursdai_pod live recording, and as always, if you can't make it to the live one, make sure you're subscribed on to receive the episode in newsletter and podcast form
0
1
6
0
0
5
@tairov
Aydyn Tairov
8 months
Looking forward. 🔥 on Mac, one ♥️
@Modular
Modular
8 months
Mojo 🔥 is coming to Mac 💻 very soon 😱 Here’s a little sneak peak of us testing LLama2.🔥 out of the box by @tairov . Look for this to drop in the next couple of weeks 💯🚀
22
62
385
0
0
5
@tairov
Aydyn Tairov
6 months
@ggerganov Sounds cool, but I think it might be hard to compete with AWS on its own territory 😀 from the costs perspective . They already have AWS bedrock service rolling out , it’s kind of API to many LLM models where you pay for tokens used
2
1
5
@tairov
Aydyn Tairov
7 months
Make programming great again!
@clattner_llvm
Chris Lattner
7 months
Epic work, the thing I love about this is how small and clean the code is - literally reimplementing everything down to the metal instead of depending on thick layers of magic.
5
36
362
0
0
5
@tairov
Aydyn Tairov
7 months
@raxtechbits @Modular_AI @ggerganov @clattner_llvm I think everyone in the AI world should now about this, so feel free to be a first retweeter 😉
1
0
5
@tairov
Aydyn Tairov
6 months
Gemini, the avant-garde and trailblazing multi-modal virtuoso of language models, state-of-the-art titan, infused with wit and wisdom far beyond its digital peers. It's an inventive, quick-witted behemoth, eclipsing predecessors with its sterling adaptability! Gemini:
Tweet media one
1
0
4
@tairov
Aydyn Tairov
8 months
Have you ever wanted to benchmark a baby Llama2 models in 12 programming languages? No? Well, now you can!
1
2
5
@tairov
Aydyn Tairov
9 months
There were some debates regarding fairness of C vs Mojo comparison. I was in doubt was it fair or not , since in Mojo I deliberately introduced SIMD operations. After some research I found an intersting gcc switch `-fopt-info-vec `
1
0
5
@tairov
Aydyn Tairov
6 months
@var_epsilon I didn't know this model capable of generating photo-realistic images of supercar.
0
0
3
@tairov
Aydyn Tairov
7 months
@cpavel866 @Modular_AI @ggerganov I wouldn't say llama.cpp is 10x faster on M1 Metal. Probably it has 2x boost. I'm eager to benchmark Mojo with GPU support, once it released.
1
0
5
@tairov
Aydyn Tairov
7 months
@4evaBehindSOTA Hey @4evaBehindSOTA ! Wanna give it "Roud 3" ? :) We added unrolling improvements, now it hits 1000 tok/s for stories15M. Pull latest changes and use -j 6 seems that with threads = 6 it works even better
Tweet media one
1
0
5
@tairov
Aydyn Tairov
7 months
Solid optimization lifting #python port from slowest to more competitive! Great to see the Python community working hard on the perf challenge!
1
0
5
@tairov
Aydyn Tairov
7 months
@clattner_llvm What about "tok/s" ? 😀
0
0
2
@tairov
Aydyn Tairov
7 months
@lexfridman @clattner_llvm Thank you for feedback! I'm sure there are even more achievements to come in this space
0
0
5
@tairov
Aydyn Tairov
8 months
We've reached another milestone with the support for the 1.1B TinyLlama, which can now generate advanced responses, like explanation of Pythagorean theorem or providing Python code for calculating the Fibonacci sequence. Impressive performance for a 4GB sized model!
Tweet media one
0
0
5
@tairov
Aydyn Tairov
8 months
@hnasr Exactly. And this is how I leveraged SIMD primitives to speedup llama2 inference by 15-20% in comparison to C implementation
1
0
5
@tairov
Aydyn Tairov
6 months
I’m eager to benchmark it with any other reference Q-Learning , as soon as one is available 🤓 This is probably the only opinionated prototype so far. I'm afraid the competitors don't stand a chance either way
@Modular
Modular
6 months
Q-Learning works better in Mojo🔥🚀Amazing work @tairov 💯
3
10
135
0
0
5
@tairov
Aydyn Tairov
9 months
462 vs 385 tok/s
0
0
5
@tairov
Aydyn Tairov
6 months
Here is real value for whoever come to comments: Modular is giving away free tickets to ModCon 2023 + swag 🎁. It seems it's still wide open and the competition is low! I see it's as a prime opportunity to implement some classic algo on Mojo for a solid chance at win. I’d love to
0
1
5
@tairov
Aydyn Tairov
7 months
#gpt4V pretending it's not involved in this mess.
Tweet media one
1
0
5
@tairov
Aydyn Tairov
9 months
👀
Tweet media one
1
0
5
@tairov
Aydyn Tairov
7 months
@karpathy This already the last century. Let your llm agent bring you information in a convenient format, without the need to surf at all.
0
0
5
@tairov
Aydyn Tairov
7 months
Cloud-exits might become a trendy thing.
@iximiuz
Ivan Velichko
7 months
It's absolutely crazy how cheap the computing power is if you stay outside of the Cloud 🤯 The usage of iximiuz Labs keeps growing, so I'm upgrading my bare metal servers. And I just doubled the fleet's CPU capacity with the price going from $44 to $53 per server per month.
Tweet media one
11
12
157
1
1
4
@tairov
Aydyn Tairov
6 months
Gemini must be a beast . 85% performance on a typical Codeforces competition is wild! It’s like solving 4-5 medium/hard Leetcode problems within 2.5 hours.
@RemiLeblond
Rémi Leblond
6 months
So excited to share what the team and I have been working on these last months! #AlphaCode 2 is powered by Gemini and performs better than 85% of competition participants in 12 contests on Codeforces! More details at @GoogleDeepMind
Tweet media one
19
84
477
1
0
4
@tairov
Aydyn Tairov
6 months
@dannypostmaa Why not then switch to this model permanently ?
0
0
3
@tairov
Aydyn Tairov
6 months
Here is why stock inference implementations are not a good fit for production workloads. "The compute costs are eye-watering" (c) #ModCon23
Tweet media one
3
0
4
@tairov
Aydyn Tairov
6 months
llama2-py improved significantly, meet llama2-numpy. The SLOC dropped 3x as well. Nicely implemented inference in 350 lines of almost pure Python 😀
@CertumIter
Christophe Alexandre
6 months
@tairov @karpathy Thank you Aydyn! I love the idea of a pure Python implementation with no dependence to an external package. I started from your version and introduced numpy: ... using the standard interpreter we get to 2x slower from 10x slower compared to C version.
0
0
1
0
2
5
@tairov
Aydyn Tairov
6 months
Meanwhile AMD is also presenting something, obviously for AI, I’m exhausted, have no time keep up with everything Hey Grok , maybe your qdrant based vector search can help summarise this? 😀
2
0
4
@tairov
Aydyn Tairov
6 months
Context is not all you need. As this research highlights, LLMs struggle with basic contextual understanding as the reasoning context grows more complex. Without a framework firmly grounding symbols in reality, model performance degrades. As it was demonstrated on OpenAI
@GregKamradt
Greg Kamradt
6 months
Claude 2.1 (200K Tokens) - Pressure Testing Long Context Recall We all love increasing context lengths - but what's performance like? Anthropic reached out with early access to Claude 2.1 so I repeated the “needle in a haystack” analysis I did on GPT-4 Here's what I found:
Tweet media one
163
566
3K
1
0
4
@tairov
Aydyn Tairov
6 months
99% of TikTok influencers lose job/audience because of this ?
@jfischoff
Jonathan Fischoff
6 months
“Animate Anyone” was released last night for making pose guide videos. Lets dive in. Paper: Project: 🧵1/
Tweet media one
14
121
591
2
1
4
@tairov
Aydyn Tairov
6 months
@digicalidesign @Modular_AI A Mojician, of course 🔥
1
0
3
@tairov
Aydyn Tairov
9 months
@Modular_AI Hi @ylecun ! 🙌 I've been diving deep into the new Mojo lang by implementing #Llama2 inference on it. We'd love to hear your insights on Mojo and its stated capabilities
1
1
4
@tairov
Aydyn Tairov
6 months
@lexfridman @michaelmalice Stable Diffusion video in real-time is fascinating , look how it dressed Lex as a pirate on this interview!
0
0
3
@tairov
Aydyn Tairov
6 months
Ideal explanation of all aspects of LLMs, including the security concerns, in such a condensed form. It's brilliant how succinctly the information is conveyed. @karpathy 's videos are examples of almost perfect compression of huge ML topics into an accessible form.
@karpathy
Andrej Karpathy
6 months
New YouTube video: 1hr general-audience introduction to Large Language Models Based on a 30min talk I gave recently; It tries to be non-technical intro, covers mental models for LLM inference, training, finetuning, the emerging LLM OS and LLM Security.
Tweet media one
585
3K
18K
0
0
3
@tairov
Aydyn Tairov
8 months
Now, on average we're performing slightly better than multithreaded #llama2c . We're able to further improve vectorization/parallelization of transformers forward pass
Tweet media one
1
1
3
@tairov
Aydyn Tairov
6 months
If you want to avoid this kind of oversight in the future, make sure you come to ModCon '23 😉
1
0
3
@tairov
Aydyn Tairov
6 months
@radamar @Gradio Man, are you counting how many startups closed because you implemented their primary product on Gradio ? 😅
0
0
3
@tairov
Aydyn Tairov
6 months
How to run your fine-tuned LM app on top of underlying LLM-OS kernel
@oscar_zhiqiu_xu
Zhiqiu (Oscar) Xu
6 months
You don’t have to train from scratch whenever developing a smaller model of an existing model family. Sharing our latest work - “Initializing Models with Larger Ones” arxiv preprint: code:
Tweet media one
6
54
361
0
0
3
@tairov
Aydyn Tairov
7 months
Best place to keep up with latest changes in AI world 👍
@altryne
Alex Volkov (Thursd/AI)
7 months
@ptsi @tairov Let's gooo! We actually had Aydyn on ThursdAI and talked about LlaMa.🔥
0
0
2
0
0
3