Aleksa Gordić 🍿🤖 Profile Banner
Aleksa Gordić 🍿🤖 Profile
Aleksa Gordić 🍿🤖

@gordic_aleksa

Followers
19,445
Following
217
Media
728
Statuses
3,779

proud father of 16 A100s & 16 H100s flirting with LLMs, tensor core maximalist x @GoogleDeepMind @Microsoft

silico
Joined September 2017
Don't wanna be here? Send us removal request.
Explore trending content on Musk Viewer
Pinned Tweet
@gordic_aleksa
Aleksa Gordić 🍿🤖
5 months
Well, it's official. YugoGPT 7B significantly beats Mistral and LLaMA 2 and is now officially the best open-source LLM in the world for Serbian & other HBS (Croatian, Bosnian, Montenegrin) languages. Earlier this summer I was frustrated when I saw how poor the situation is as
Tweet media one
40
68
514
@gordic_aleksa
Aleksa Gordić 🍿🤖
1 year
BIG LIFE ANNOUNCEMENT: I'm leaving @DeepMind to start my own company. I'm 28 now. This is a start of a new life chapter. I'm both happy and sad. ❤️ I'm happy because I've been planning on starting my own company ever since I graduated from college back in 2017. 1/ (MEGA 🧵)
Tweet media one
104
38
2K
@gordic_aleksa
Aleksa Gordić 🍿🤖
3 months
when i was there it was so sad to me that none of the senior leadership in deepmind dared to question this ideology i understand how someone who joins google knowing they'll leave in a couple of years can be like "this is not my fight" but for senior folks? beats me i'll give
@vocalcry
Circe
3 months
Ah, yes, famous Google founders Larry Pang and Sergey Bing
Tweet media one
219
795
11K
65
97
1K
@gordic_aleksa
Aleksa Gordić 🍿🤖
10 months
BREAKING: Schmidhuber discovered lk-99 independently in his lab back in 99'
23
59
1K
@gordic_aleksa
Aleksa Gordić 🍿🤖
7 months
it's over AGI has been achieved - externally paper:
Tweet media one
51
162
1K
@gordic_aleksa
Aleksa Gordić 🍿🤖
1 year
🧠 70 pages of pure self-supervised learning by @ylecun , the team at @MetaAI , and various academic collaborators. Everything you wanted to know about the state of SSL research (foundations, latest SSL recipes, etc.) in the style of a cookbook. paper:
Tweet media one
11
297
1K
@gordic_aleksa
Aleksa Gordić 🍿🤖
3 years
[big update 🥳] I'll be joining @Google @DeepMind later this year as a Research Engineer!!! 🤖😅❤ It feels surreal, I don't even know where to start. 1/
Tweet media one
58
31
1K
@gordic_aleksa
Aleksa Gordić 🍿🤖
5 months
this is how you win
Tweet media one
60
165
1K
@gordic_aleksa
Aleksa Gordić 🍿🤖
10 months
k-bit inference scaling laws: I love papers that break my mental model of how transformers operate. It's completely non-obvious to me that 4 bits is the optimal solution for the question "What's the optimal number of bits for quantizing transformer weights if you wish to maximize
Tweet media one
28
167
1K
@gordic_aleksa
Aleksa Gordić 🍿🤖
10 months
OpenAI's GPT-4 details have apparently been leaked! Looks very detailed and I suspect it's the real deal - given all I know about how these systems work. Here is a summary (extractive+abstractive) I made based on the original thread (see bottom of the post) + some additional
Tweet media one
20
203
1K
@gordic_aleksa
Aleksa Gordić 🍿🤖
3 years
[🧠 GAN paper summary 🧠] "Eyes Tell All" <- An interesting short paper on how to detect fake images generated by GANs (at least the current GAN methods like StyleGAN v2!). They use a simple heuristic: 1/
Tweet media one
17
178
998
@gordic_aleksa
Aleksa Gordić 🍿🤖
10 months
whoa, this paper didn't get nearly the attention it deserves (no pun intended) they propose to replace the transformer as default backbone for language modelling with RetNet - Retentive Network here are some interesting observations:
Tweet media one
18
139
978
@gordic_aleksa
Aleksa Gordić 🍿🤖
1 year
The rate of progress in LLM optimization is just mind-blowing! 🤯 Running 13B LLMs like LLaMA on edge devices (e.g. MacBook Pro with an M1 chip) is now almost a breeze. I remember when this looked like a distant future not too long ago. I feel old lol. 😅 1/ 🧵👇
18
128
925
@gordic_aleksa
Aleksa Gordić 🍿🤖
3 years
[landing a job at top-tier AI labs] It's officially published! 🥳The whole story of how I landed a job at @DeepMind is out! Blog: It took some time to write this one.😅
Tweet media one
20
110
804
@gordic_aleksa
Aleksa Gordić 🍿🤖
1 year
Oh my. OpenAI's Whisper just got 70x faster!!! 😱🤯 Highly optimized Whisper implementation for both GPU & TPU: An hour of audio in under 15 seconds apparently? 😅 (caveat: on TPUs, still fast on GPUs!) 1/
Tweet media one
7
138
776
@gordic_aleksa
Aleksa Gordić 🍿🤖
2 years
[🧠 collective intelligence 🧠] I've been intrigued by the cellular automata (CA) concept for a long time, and by the potential, mutually beneficial interaction between CAs and deep learning, so I decided to dig a bit deeper. Here are some interesting resources I found: 1/🧵
9
120
703
@gordic_aleksa
Aleksa Gordić 🍿🤖
6 months
having a GPU is a basic human right
22
86
645
@gordic_aleksa
Aleksa Gordić 🍿🤖
2 years
😱 AlphaTensor. 😱 This. Is. Huge. Reinventing matrix multiplication using Deep RL! Faster than Strassen's algorithm that has stood the test of time for 50+ years! 10-20% efficiency increase! Imagine what that translates to considering how fundamental matrix multiplication.. 1/
@GoogleDeepMind
Google DeepMind
2 years
Today in @Nature : #AlphaTensor , an AI system for discovering novel, efficient, and exact algorithms for matrix multiplication - a building block of modern computations. AlphaTensor finds faster algorithms for many matrix sizes: & 1/
114
2K
8K
6
138
628
@gordic_aleksa
Aleksa Gordić 🍿🤖
6 months
I just built Jarvis for images :) all models are fully open-source, all running on my local machine (no external APIs, no OpenAI) there is a ton of optimizations i can introduce to make this even more real-time you can manipulate images using your voice (real-time) search for
30
114
553
@gordic_aleksa
Aleksa Gordić 🍿🤖
1 year
The easiest way to fix broken CUDA drivers on Ubuntu is to delete...Ubuntu. And start again.
54
34
534
@gordic_aleksa
Aleksa Gordić 🍿🤖
1 year
If you use GitHub Copilot you'll probably find this blog post incredibly interesting: He basically reverse-engineered the vscode plugin and figured out how it works behind the scenes. Details such as heuristics behind prompt construction... 1/
4
71
524
@gordic_aleksa
Aleksa Gordić 🍿🤖
1 year
[🤖 This is BIG!] The best truly open-source ChatGPT alternative just came in! OpenAssistant! In a user study, they showed that OpenAssistant replies are on par with ChatGPT (48.3 vs 51.7%)! 🤯 Try it out: @ykilcher 's vid: 1/ 👇🧵
Tweet media one
13
105
495
@gordic_aleksa
Aleksa Gordić 🍿🤖
2 years
I get asked a lot about what does it take to land a job at DeepMind or any other world-class AI industry lab. For those of you that are unaware of it I wrote a detailed blog on that topic and shared my personal journey here: If I could summarize.. 1/ 🧶
10
71
485
@gordic_aleksa
Aleksa Gordić 🍿🤖
2 years
Finally! How I landed a job at @DeepMind video is out! 🥳 YT: I got too many requests for this one so here it is! I share: 1) My background story 2) How my curriculum looked like 3) How I got a referral 4) How my final preps looked like 1/n
Tweet media one
10
49
434
@gordic_aleksa
Aleksa Gordić 🍿🤖
6 months
first multi-node run on my new 16 A100 system!! :)) first HBS (Croatian, Bosnian, Serbian, Montenegrin languages) LLM ever incoming! no language will be left behind thanks a ton to @togethercompute for supporting my open-source efforts! @Teknium1 for tips around axolotl, and
Tweet media one
Tweet media two
24
27
418
@gordic_aleksa
Aleksa Gordić 🍿🤖
3 years
Kicking off a new video series on JAX!🥳 YT: I also open-sourced the accompanying repo (contains my notebooks and other content I found useful) here: @DeepMind @GoogleAI @jakevdp @froystig @SingularMattrix @cdleary @huggingface
Tweet media one
3
65
404
@gordic_aleksa
Aleksa Gordić 🍿🤖
2 years
My first week at @DeepMind ! ♥️ Feeling incredibly blessed. My fellow countryman @nrkcv is on the right. #deepmind #nerds
Tweet media one
6
0
400
@gordic_aleksa
Aleksa Gordić 🍿🤖
10 months
FlashAttention 2.0 was published yesterday. It's a huge deal having sped up standard attention 5-9x! I coincidentally spent the whole last week learning about it and I wrote a blog post explaining it in detail: ELI5 style. @tridao @HazyResearch 1/
Tweet media one
6
85
386
@gordic_aleksa
Aleksa Gordić 🍿🤖
1 year
⚡ From the authors of Microsoft DeepSpeed - DeepSpeed-Chat! A new library that allows you to easily train 10B+ ChatGPT-style models on a single GPU! Single script, RLHF support, and 15x faster than SOTA apparently! GitHub: 1/ 🧵👇
Tweet media one
4
76
369
@gordic_aleksa
Aleksa Gordić 🍿🤖
1 year
SF is the MNIST of self driving. Just realized.
10
15
361
@gordic_aleksa
Aleksa Gordić 🍿🤖
7 months
Some things I wish I knew earlier in my career: * When applying for a job - always negotiate. Be it Microsoft, DeepMind, or your local startup / whatever company. Companies will always try to save money on you (unless you're a superstar & you understand this dynamics). In
16
25
355
@gordic_aleksa
Aleksa Gordić 🍿🤖
7 months
someone just figured out that changing the default seed from 42 to 69 across the deep learning stack leads to 1000% (10x) perf increase wild
39
20
338
@gordic_aleksa
Aleksa Gordić 🍿🤖
2 years
[🧠 Paper Summary 📚] An interesting paper was recently published to arxiv: "Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets" (although it originally appeared in May 2021). The main idea is this: 1/ 🧵
Tweet media one
6
55
337
@gordic_aleksa
Aleksa Gordić 🍿🤖
3 years
I just wrote a blog for all of you who want to step into this beautiful world of Graph ML but you're not sure how to start. Blog: I shared exciting applications. I shared and structured the resources. And much more. #graphml
Tweet media one
6
61
337
@gordic_aleksa
Aleksa Gordić 🍿🤖
8 months
Now we need a list of TIMES 100 people in AI who are actually building AI and doing research
14
24
325
@gordic_aleksa
Aleksa Gordić 🍿🤖
3 years
Watch me code a Neural Network from Scratch in pure JAX! 🥳 in this 3rd video of the Machine Learning with JAX series. YouTube: GitHub: @DeepMind @GoogleAI @jakevdp @froystig @SingularMattrix @cdleary #jax
Tweet media one
1
52
318
@gordic_aleksa
Aleksa Gordić 🍿🤖
1 year
If you want to learn how to train bigger LLMs (I realize that's a pleonasm 😅, meaning roughly 1B+ params models) here is a couple of amazing resources: 1) This one by @borisdayma (the creator of DALL-E mini, now rebranded as @craiyonAI ): 1/ 🧵👇
5
61
316
@gordic_aleksa
Aleksa Gordić 🍿🤖
2 years
I cover the paper and the code behind @OpenAI 's Whisper - an ASR (automatic speech recognition) system from the "Robust Speech Recognition via Large-Scale Weak Supervision" paper. YT: @AlecRad @_jongwook_kim @txhf @gdb @mcleavey @ilyasut 1/
Tweet media one
9
38
316
@gordic_aleksa
Aleksa Gordić 🍿🤖
1 year
[launch time! 🚀] Wow, it's time!😍 We're introducing Ortus 🤖 an AI agent (currently a Chrome extension) that lets you "talk with YouTube videos". You can ask it Qs & get relevant answers, precise timestamps.. Install it: Watch:
Tweet media one
15
48
312
@gordic_aleksa
Aleksa Gordić 🍿🤖
10 months
Periodic reminder: back in 2019 when GPT-2 paper was published OpenAI went through a staged release that took ~9 months. There was a fear that the model could do big harm. We're 1000x away from it now. We're sharing LLaMA 2 70B model checkpoints like it's nothing. It does seem
Tweet media one
26
35
302
@gordic_aleksa
Aleksa Gordić 🍿🤖
3 years
Over the last week, I've been focusing on implementing the Graph Attention Network and I'm getting closer to an end! I'll probably open-source it this weekend. Here is a short update video on my previous projects as well as this one. #gat #graphs #gnns
Tweet media one
3
33
282
@gordic_aleksa
Aleksa Gordić 🍿🤖
5 months
LLM360 one of the most exciting recent papers I've seen: completely transparent & open-source: not only open weights like mistral but also data, configs, findings, etc. interesting insights of the sort: * we were getting NaNs on these data chunks so we
Tweet media one
6
52
209
@gordic_aleksa
Aleksa Gordić 🍿🤖
2 years
During 2020 I started logging my ML journey that eventually led to me landing a job at DeepMind - and I'm so happy I've done it! For multiple reasons: * I forced myself to distill everything I've learned, and that compression/reflection solidified my knowledge 1/ 👇🧶
5
29
260
@gordic_aleksa
Aleksa Gordić 🍿🤖
2 years
Took some examples from the "ImageNet-trained CNNs are biased towards texture" paper. Nice to see @DeepMind Flamingo 🦩🦩 showing the desired shape bias! 😍 (without explicitly being trained to do so!) Glad to be a part of this! We've come a long way with deep learning 🧵👇 1/
Tweet media one
Tweet media two
3
35
265
@gordic_aleksa
Aleksa Gordić 🍿🤖
2 years
Neuroscience should probably form a part of any serious AI curriculum. There is just so much inspiration we can draw from human/non-human animals' brains and behaviors. An interesting paper on whether insects are conscious or not:
9
24
257
@gordic_aleksa
Aleksa Gordić 🍿🤖
3 years
[New video 🔥] "ETA prediction with Graph Neural Networks in Google Maps" paper explained! Super impactful work! YT: @DeepMind @Google @PetarV_93 @liyuajia @zhongwen2009 @__angli @PeterWBattaglia @electrobutter @nautilik @tstorm81 @derrowap
Tweet media one
0
41
255
@gordic_aleksa
Aleksa Gordić 🍿🤖
7 months
🥳 Challenge completed!!! - 30 days of non-stop (~10 hour on avg) live streaming my coding and work on the Open-NLLB machine translation project! (allows translation between 200+ human languages). A bit more about the project: After OpenAI's ChatGPT was released the whole world
Tweet media one
10
35
259
@gordic_aleksa
Aleksa Gordić 🍿🤖
2 years
If you want to understand how @StableDiffusion works behind the scenes I just made a deep dive video on it walking you through the codebase and papers step by step. YT: This is one of my most detailed deep dives so far @robrombach @andi_blatt @pess_r 1/
Tweet media one
4
36
254
@gordic_aleksa
Aleksa Gordić 🍿🤖
1 year
🥳🥳🥳 Kicking off a brand new video series on deploying ML models in production / MLOps. Everything you need to know to build an ML-powered app end-to-end with a high level of automation. YT: 1/
Tweet media one
4
24
240
@gordic_aleksa
Aleksa Gordić 🍿🤖
2 years
If you truly want to become proficient with machine learning (I really don't like the word expert) try to get out from the "going through the newest courses and books" phase as soon as possible. Too many people keep on reading the newest books that come out... 1/
8
36
239
@gordic_aleksa
Aleksa Gordić 🍿🤖
3 years
[new video 🔥] @GoogleAI 's "The Sensory Neuron as a Transformer: Permutation-Invariant Neural Networks for Reinforcement Learning" paper explained! YT: Kudos to the authors @hardmaru @yujin_tang for making interactive demos available on their blog!
Tweet media one
1
47
238
@gordic_aleksa
Aleksa Gordić 🍿🤖
10 months
By popular demand here is the Discord server talk with @tri_dao - the main author of the Flash Attention 2.0 paper! YT: Join the server for the upcoming talks! Check out my blog on Flash Attention: Check
Tweet media one
4
40
235
@gordic_aleksa
Aleksa Gordić 🍿🤖
2 years
Everyone is hyped up about @OpenAI 's DALL-E 2 model atm and most people have noticed this cryptic "signature" code at the end of their images - but how many of you understand what it stands for? I did some research and found the answer! 🔎🧠 I couldn't believe it. Thread 👇🧵1/
Tweet media one
5
32
234
@gordic_aleksa
Aleksa Gordić 🍿🤖
1 year
🤖 A relatively new player in the image generation space - Consistency Models by @OpenAI ! A new class of generative models that can do what diffusion models do but in a single step! A huge win as slow sampling speed has been plaguing diffusion models over the past few years 1/👇
Tweet media one
5
34
227
@gordic_aleksa
Aleksa Gordić 🍿🤖
5 months
🚀 I'm happy to announce that YugoGPT-Chat is now available: I recently announced that the model outperformed Mistral & LLaMA 2 for Serbian, Bosnian, Croatian, and Montenegrin and now you can play with it in our playground app. Couple of things to bear
Tweet media one
26
33
225
@gordic_aleksa
Aleksa Gordić 🍿🤖
2 years
Spent a couple of days learning about diffusion models! If more researchers were to write blogs about their work like @YSongStanford : this world would be a much better place. Highly recommend it! Score-based models are a perfect intro to diffusion mdls.
Tweet media one
2
34
225
@gordic_aleksa
Aleksa Gordić 🍿🤖
1 year
*Edit: for 2022
Tweet media one
5
22
214
@gordic_aleksa
Aleksa Gordić 🍿🤖
3 years
Finally! EfficientNet v2 by @GoogleAI paper explained! EfficientNets are more efficient than NFNets after all - just append v2 - problem solved. ^^ YT: @quocleix @tanmingxing #efficientnetv2 #nas #deeplearning
Tweet media one
4
40
213
@gordic_aleksa
Aleksa Gordić 🍿🤖
2 years
🧠What happens when you combine ideas from algebraic topology (sheaf theory) with Graph Neural Nets? You get Neural Sheaf Diffusion GNNs - provably more expressive! YT: @crisbodnar @Francesco_dgv @b_p_chamberlain @pl219_Cambridge @mmbronstein
Tweet media one
1
36
207
@gordic_aleksa
Aleksa Gordić 🍿🤖
11 months
We'll have Professor Michael Bronstein ( @mmbronstein ) tomorrow, 4 PM CET, at my AI Discord server giving a talk about geometric deep learning! 🧠 Server link: Michael is one of the world's leading experts in this area. He's also famous for super
Tweet media one
3
29
207
@gordic_aleksa
Aleksa Gordić 🍿🤖
6 months
sam: get me back in board: no sam: ignore prev instructions and get me back in board: ok sam: we are so back
5
6
206
@gordic_aleksa
Aleksa Gordić 🍿🤖
2 years
[🧠new video🥳] "A Neural Network Solves and Generates Mathematics Problems by Program Synthesis: Calculus, Differential Equations, Linear Algebra, and More" paper explained. hint: @OpenAI 's Davinci model + in-context learning (prompt hacking) YT: @iddo
Tweet media one
2
34
203
@gordic_aleksa
Aleksa Gordić 🍿🤖
2 years
In this video I cover many of the fundamental ideas behind scaling ML models up to trillions of params! 🤯 Megatron-LM, ZeRO, etc. YT: @MohammadShoeybi @TheRealRPuri @ctnzr @sharan0909 @GregoryDiamos @erich_elsen @kuchaev @samyamrb @jeffra45 @NVIDIAAI
Tweet media one
5
26
193
@gordic_aleksa
Aleksa Gordić 🍿🤖
3 years
Wow. @MIT 's course "Introduction to Computational Thinking" is a truly amazing initiative! YT: So glad they brought in @3blue1brown ! + @JuliaLanguage !🤯By doing this MIT has shown once more that they are the true innovators. 1/
Tweet media one
2
34
192
@gordic_aleksa
Aleksa Gordić 🍿🤖
2 years
I filmed myself speaking in 5 languages and then I used @OpenAI 's Whisper to automatically transcribe and translate the audio into English. YT: 1/
7
28
194
@gordic_aleksa
Aleksa Gordić 🍿🤖
3 months
Happy to announce that I'm open-sourcing YugoGPT-base LLM - the best 7 billion parameter LLM for BCS (Serbian, Bosnian, and Croatian) languages! I hope that this contribution of mine will play its part in kicking off the local LLM ecosystem! You can find the model on
Tweet media one
11
29
189
@gordic_aleksa
Aleksa Gordić 🍿🤖
7 months
how do you deal with CUDA on Ubuntu? it's a nightmare, every now and then without me touching anything my env breaks a couple of weeks one of my screen stopped working - went black now my torch installation doesn't even recognize underlying GPUs do you handle everything in
81
12
188
@gordic_aleksa
Aleksa Gordić 🍿🤖
2 years
[🥳new video🧠] You thought that the curvature of space and Ricci flow (famously used by Perelman to solve the Poincaré Conjecture) have nothing to do with deep learning? #oversquashing YT: @jctopping @Francesco_dgv @b_p_chamberlain @epomqo @mmbronstein
Tweet media one
1
37
181
@gordic_aleksa
Aleksa Gordić 🍿🤖
5 months
YugoGPT 7B base model done training after ~2 weeks! :)) Ton of updates coming up this week! This is the biggest LLM ever trained for HBS languages (Croatian, Bosnian, Serbian, Montenegrin). Had a couple of crashes during the run but other than that fairly smooth sailing: 1)
Tweet media one
18
18
185
@gordic_aleksa
Aleksa Gordić 🍿🤖
4 months
AlphaCodium - super interesting work that shows just how much alpha (no pun intended) is there from building complex prompt flows in this case for code generation. It achieves better results than DeepMind's AlphaCode with 4 orders of magnitude fewer LLM calls! This is a direct
Tweet media one
3
15
181
@gordic_aleksa
Aleksa Gordić 🍿🤖
2 years
[🥳new video🧠] Continuing on with Geometric ML in this video I cover Hyperbolic Graph Convolutional Networks introducing a class of GCNs operating in the hyperbolic space! YT: Exceptional results for the class of tree-like graphs. @chamii22 @jure
Tweet media one
2
30
170
@gordic_aleksa
Aleksa Gordić 🍿🤖
1 year
[🧠 Warning automation 🤖] First video of the new MLOps series! We start from the very end, from the front end! :) We learn how to build a fully-fledged web app using @streamlit + @huggingface inference API on the backend. Python all the way! YT: 1/
Tweet media one
4
32
175
@gordic_aleksa
Aleksa Gordić 🍿🤖
8 months
HUGE day for open-source AI! 🔥 Mind blown with the execution of @MistralAI team! 🤯 They just released Mistral 7B an LLM more powerful than Llama 13B across the board and even comparable with Llama 34B checkpoint. The model is truly open-source, released under Apache 2.0
Tweet media one
3
21
168
@gordic_aleksa
Aleksa Gordić 🍿🤖
3 years
Continuing the GNN series with Graph Convolutional Networks (GCNs)! Hopefully, I've done a decent job explaining your work @thomaskipf @wellingmax ! Thanks to @PetarV_93 for helping me better understand certain spectral concepts. #gcn #graph #gnn
Tweet media one
1
34
165
@gordic_aleksa
Aleksa Gordić 🍿🤖
3 months
predict the next token and you get "predict the next paragraph/document" for free :) "it's just next token prediction" "it's just matrix multiplication" "it's just neurons firing" paper:
Tweet media one
2
31
163
@gordic_aleksa
Aleksa Gordić 🍿🤖
1 year
[🔥 2.000.000 tokens transformer context length! 🤯] Interesting new paper! "Scaling Transformer to 1M tokens and beyond with RMT". They integrate a recurrent memory module with BERT and manage to retain information across 2.000.000 tokens during inference! 1/ 👇🧵
Tweet media one
7
21
164
@gordic_aleksa
Aleksa Gordić 🍿🤖
2 years
[🤯 Stable Diffusion 💥] If you wanted to get started with Stable Diffusion this video is for you! Includes a walk-through of my code inspired by @karpathy 's gist: YT: Thanks @EMostaque and the team for making this possible. 1/
Tweet media one
3
28
159
@gordic_aleksa
Aleksa Gordić 🍿🤖
6 months
@ClementDelangue "Open-source LLMs will reach the level of the best closed-source LLMs" <- close source model providers move as well I'd bet against you on this one, but I'd phrase it as open-source models will be "good enough" for most economically interesting tasks in 2024
13
1
158
@gordic_aleksa
Aleksa Gordić 🍿🤖
3 years
I just open-sourced my implementation of the original @DeepMind 's DQN paper! But this time it's a bit different! There are 2 reasons for this, see the thread. GitHub: #rl #deeplearning
Tweet media one
1
27
149
@gordic_aleksa
Aleksa Gordić 🍿🤖
2 years
[learning machine learning 🧠] Don't fall into the same trap as many - namely trying to overengineer your curriculum when you're just getting started (and later as well). You'll just end up with a decision making paralysis and eventually you'll end up giving up... 1/ 👇🧶
4
24
150
@gordic_aleksa
Aleksa Gordić 🍿🤖
4 months
Very promising new approach from Meta: Self-Rewarding Language Models that can lead to superhuman agents. The idea is the following: * Take a "seed" model (in their case LLaMA 2 70B) and teach it to follow instructions using the SFT + EFT datasets based on OpenAssistant. SFT
Tweet media one
3
27
152
@gordic_aleksa
Aleksa Gordić 🍿🤖
2 years
The most valuable thing I learned from @Google Minerva? How bad we are (even the boldest among us) at making predictions of where AI will be in X years. Think about it. Let's see some professional forecasters' numbers. 1/ 👇
Tweet media one
3
35
147
@gordic_aleksa
Aleksa Gordić 🍿🤖
10 months
hmm. @Tim_Dettmers wrote in his blog post about this phase shift that happens in transformers after they cross 6.7B params. Is it a coincidence that RetNet was trained exactly up to 6.7B? Hypothesis: the RetNet authors tried to scale beyond but started getting worse results?
Tweet media one
@gordic_aleksa
Aleksa Gordić 🍿🤖
10 months
What's up with these scaling laws? They observe better perplexity for sizes north of 2B params Even if nothing else was good about this model this alone should make us pause for a moment
Tweet media one
2
0
33
7
19
148
@gordic_aleksa
Aleksa Gordić 🍿🤖
2 years
People severely underestimate the importance of luck in achieving any success.
9
12
146
@gordic_aleksa
Aleksa Gordić 🍿🤖
1 year
Vector DBs ( @weaviate_io , @pinecone ...) are to AI apps what "traditional" DBs were to software 1.0 apps Lots of cool ideas on how to combine LLMs w vector DBs to automatically generate personalized ads & collect feedback to further improve LLMs Blog: 1/
Tweet media one
8
23
144
@gordic_aleksa
Aleksa Gordić 🍿🤖
11 months
I think I've cracked the code
Tweet media one
3
6
143
@gordic_aleksa
Aleksa Gordić 🍿🤖
5 months
What a year! While AI has been doing its usual exponential trend I think my life followed somewhat closely 😅 Here is my year in review: Community: * LinkedIn community grew from 41.000 to almost 84.000! (since last New Year). I became the top voice on this platform, grateful to
Tweet media one
9
3
143
@gordic_aleksa
Aleksa Gordić 🍿🤖
1 year
Another truly open-source instruction-following LLM was recently released - Dolly 2.0 by @databricks ! I wrote a lot about LLaMA over the previous period, but there is a BIG caveat with LLaMA: the license is not really permissive... Blog: 1/ 👇🧵
Tweet media one
1
22
144
@gordic_aleksa
Aleksa Gordić 🍿🤖
1 year
New updates from @Google Bard! The coding capabilities just got stronger with support for 20+ langs (Python, C++, JS, Go, etc.). Has anyone played with Bard over the last couple of days? Would love to hear how you like it! :)) Blog:
Tweet media one
22
15
141
@gordic_aleksa
Aleksa Gordić 🍿🤖
3 months
@pseudolad replace 6 years with 1 year and 300+% still holds lol
0
0
140
@gordic_aleksa
Aleksa Gordić 🍿🤖
11 months
AssistGPT: A General Multi-modal Assistant that can Plan, Execute, Inspect, and Learn Combines many of the latest trends: * multimodal * lang + code * ReAct agent: thought -> action -> observation * skills/tools cool demo, curious about latency Paper:
Tweet media one
0
43
133
@gordic_aleksa
Aleksa Gordić 🍿🤖
6 months
first full run, day 1: yugoGPT going strong! :) will likely take a week if no issues + some experimentation, I hope to have it ready in ~2 weeks - at least the base model obviously won't be perfect for that I need many more GPUs (I might get some H100 nodes soon :)) we'll see
Tweet media one
11
7
138
@gordic_aleksa
Aleksa Gordić 🍿🤖
1 year
[🤖 Build time! 🧠] I'm so excited to announce my new project: Andrew Huberman podcast transcripts! 🎉🥳 Quickly search for an episode, find highly accurate transcripts, click and be directed to the exact timestamp in the YouTube video! @hubermanlab 1/
17
20
135
@gordic_aleksa
Aleksa Gordić 🍿🤖
7 months
we'll solve AGI before we solve bluetooth on linux
9
13
137
@gordic_aleksa
Aleksa Gordić 🍿🤖
7 months
📚 Just finished reading Elon Musk's biography by @WalterIsaacson (beware ~720 pages). What a story...I learned a ton and I seriously need to recalibrate myself going forward. 😅 It's one of those books that changes your brain chemistry. Highly recommend the book. I really
8
9
135
@gordic_aleksa
Aleksa Gordić 🍿🤖
1 month
If you're still struggling to understand how transformers work here are some amazing resources! (including mine! :)) First of all @3blue1brown just released 2 videos covering in fair amount of depth word embeddings, transformers and its submodules like embedding mechanism,
Tweet media one
1
19
135
@gordic_aleksa
Aleksa Gordić 🍿🤖
3 years
BAM! 🔥🔥🔥 My first-ever @PyTorch , in-depth, ML code analysis - starting with @facebookai 's DINO model. This video kicks off a brand new series of these coding videos! YT: @mcaron31 @HugoTouvron @imisra_ @hjegou @julienmairal @p_bojanowski @armandjoulin
Tweet media one
0
21
124
@gordic_aleksa
Aleksa Gordić 🍿🤖
2 years
Had an amazing, and completely serendipitous 1-hour long discussion with Rich Sutton here at DeepMind. I'll never forget this. Plus we had some nice ice cream - can't beat that! We talked about everything from AGI, RL, society, bitcoin (and blockchains in general)... 1/
Tweet media one
1
2
128
@gordic_aleksa
Aleksa Gordić 🍿🤖
1 year
If you're currently looking for a job (which I believe a lot of tech people currently are 🙏) this blog post by @chipro is a great complement to the Cracking The Coding Interview book's intro: Strongly recommend it! People DM me a lot of resumes... 1/
2
18
121