_xjdr Profile Banner
xjdr Profile
xjdr

@_xjdr

Followers
23K
Following
26K
Media
741
Statuses
6K

ptx enjoyer

Joined December 2023
Don't wanna be here? Send us removal request.
@_xjdr
xjdr
1 year
Writing jitted jax code is like playing Dark Souls but in python.
12
17
387
@_xjdr
xjdr
5 months
She's not even the best CEO in her family.
@TIME
TIME
6 months
Lisa Su is TIME's CEO of the Year
Tweet media one
116
344
13K
@_xjdr
xjdr
7 months
at a certain point, this is harassment
Tweet media one
170
305
7K
@_xjdr
xjdr
11 months
@tekbog the accuracy is causing me physical pain.
8
15
5K
@_xjdr
xjdr
3 months
as much as i am paying attention to AI each and every day, the future snuck up on me last night and this is Day 0 of a brand new world. i can confidently say that now.
62
121
4K
@_xjdr
xjdr
3 months
.
Tweet media one
23
40
3K
@_xjdr
xjdr
6 months
I don't think people understand how funding works and how crippling a seed / series A valuation of $500M is. I'm wishing them the best of luck, but they are now doing a very hard thing on very hard mode.
@Trace_Cohen
Trace Cohen
6 months
$56M seed for former CTO of Stripe for 7yrs sound about right. Will 100% raise $100M in 6 months at $1B+ val - I don’t make the rules
Tweet media one
63
65
3K
@_xjdr
xjdr
10 months
I know some of the smartest AI researchers in the world (personal biased opinion) and not one of them can figure out how to do basic javascript web developing as they say it is too difficult. I think that says a lot about both parties if im being completely honest.
64
39
875
@_xjdr
xjdr
4 months
I have successfully replaced all my o1-pro, gemini and sonnet usage with R1. R1 is not perfect and does take some additional effort compared to the others and can get doom loopy, but I do not feel it's a compromise. In fact, I have been absolutely floored by some of the.
65
119
2K
@_xjdr
xjdr
8 months
sonnet is legit 100% my engineer partner and pair programmer. Not autocomplete, but core concepts and ideas and design. I can do the coding, but i need a partner to converse. Its shocking how quickly it happened and how i can really never go back to how it was before.
61
93
2K
@_xjdr
xjdr
3 months
This is claude's repo now, im just its caretaker
Tweet media one
55
65
2K
@_xjdr
xjdr
4 months
Turns out almost all of you would still be in the same place you are now with 1000 jr engineers at your disposal. A tragedy for the "idea guy" narrative.
64
53
2K
@_xjdr
xjdr
10 months
after using 405B as my primary LLM for a little bit now, i have become completely intolerant of refusals or verbose cautionary dialog from gpt-4o, gemini and claude. I don't think this is going to be universal, but i am certainly less inclined to use other models as a result.
21
30
771
@_xjdr
xjdr
5 months
some of y'all have never managed a large team full of jr devs and it shows.
@skydotcs
sky
5 months
cs grads: dropout now, there is not a single job in CS anymore. AI will take over, this field won’t exist in a year.
Tweet media one
36
34
2K
@_xjdr
xjdr
10 months
Google has:. - AlphaZero. - pretty good at search and indexing. - Gemini goodharting lmsys with 1M ctx len. - some of the best researchers and engineers in the world (now once again including Noam and the lingvo avengers).- the *best* training and serving infrastructure and.
56
74
1K
@_xjdr
xjdr
3 months
how am i supposed to work when my entire team is on strike. they have all abandoned me .
Tweet media one
28
18
1K
@_xjdr
xjdr
9 months
If google ever started selling TPU hardware and released internal tooling, they'd MOG nvidia so bad. Just a trillion dollar company waiting to be built. most people don't realize how good JAX + TPUs + (other stuff) really is.
53
43
1K
@_xjdr
xjdr
10 months
Llama3.1 does not use any crazy tricks or unknown methods or techniques (the LLM at least). its just lots of clean data, fed into a vanilla transformer at massive scale. This is a giant engineering flex. I'm not sure if this is encouraging or not but it certainly is surprising.
21
71
1K
@_xjdr
xjdr
10 months
Noam is back at Google and Greg, Ilya, Andrej and John have all left OpenAI. Anthropic has really bolstered their research staff. feels like some pretty monumental sea changes we are in the midst of . .
15
23
785
@_xjdr
xjdr
3 months
If I were you I'd be studying either RL (starting with GRPO) or PTX (starting with cuda). If I were much younger me I'd be studying my ass off in both subjects plus MuZero and training 0.5B models every day on my 4090.
32
45
1K
@_xjdr
xjdr
5 months
this is incredible but also probably the first time we've seen up to date SOTA announced. to put a finer point on it, 2 or 3 years ago these numbers would have represented essentially consensus achievement of AGI
Tweet media one
Tweet media two
Tweet media three
26
46
1K
@_xjdr
xjdr
4 months
in limited testing, Deep research can completely replace me for researching things i know nothing about to start (its honestly probably much better and certainly much faster). Even for long reports on things i am fairly knowledgeable about, it competes pretty well on quality (i.
34
44
1K
@_xjdr
xjdr
10 months
IMAGE AUDIO AND VIDEO. THIS IS NOT A DRILL. WE HAVE SOTA AT HOME
Tweet media one
Tweet media two
21
58
982
@_xjdr
xjdr
5 months
this is the first potential "stop what i am doing and investigate everything about this" thing i've seen in a while.
@zhou_xian_
Zhou Xian
5 months
Everything you love about generative models — now powered by real physics!. Announcing the Genesis project — after a 24-month large-scale research collaboration involving over 20 research labs — a generative physics engine able to generate 4D dynamical worlds powered by a physics
28
40
980
@_xjdr
xjdr
3 months
Been in Monk mode and missed the MCP and Manus TL barrage. i am averaging about 10k LoC a day per project on 3 projects simultaneously and id say 90% no slop. when slop happens i have to go in an deslop by hand / completely rewrite but so far its a reasonable tradeoff. this is.
33
44
989
@_xjdr
xjdr
8 months
independent entropix result confirmation on the latest and greatest update, this time with torch! . i think we may have made something good.
@nrehiew_
wh
8 months
A 1B model behaving like this is not normal btw
Tweet media one
35
40
953
@_xjdr
xjdr
8 months
And we can call this a initial success. Entropy based injection of CoT tokens to tell the model to re-evaluate (o1 style) and inject entropy based on branching to arrive at the correct value. Argmax returns the expected "9.11 is greater than 9.9". This is L3.2 1B
Tweet media one
39
57
961
@_xjdr
xjdr
5 months
i spent a lot of time with people outside of the tech world over the holidays and maaaybe 1 in 10 had used ai at all (only chatgpt for those who had) before. the only exception was a handful of them had become perplexity devotees (i was shocked). Most were still very skeptical.
79
41
935
@_xjdr
xjdr
7 months
This has been and will continue to be my recommendation for anyone in this position. Learn jax and sign up for Its one of the best things Google has ever done. You can do meaningful research for free, but the learning curve is steep. strap in.
@wordgrammer
wordgrammer
7 months
To get money, you need a job in AI. To get a job in AI you need to understand Cuda, Cloud computing, distributed systems, Pytorch/Jax, and Triton. To learn Cuda, Cloud computing, distributed systems, Pytorch/Jax, and Triton, you need money. Where is the on-ramp here?.
15
49
917
@_xjdr
xjdr
5 months
I was on a hiring committee at google and it was pretty easy to just ask "tell me in detail about how something you worked on and were proud of works".
@GrantSlatton
Grant Slatton
5 months
People who complain about leetcode questions during the interview process need to put themselves in the shoes of the company. You have 1000 resumes to get through, at least 50% of whom can’t actually code. How would you filter through this stack in a reasonable amount of time?.
32
19
907
@_xjdr
xjdr
8 months
This is at least 60% of AI research.
@basedjensen
Hensen Juang
8 months
dawg they been gate keeping all this stuff behind mathy bs and badly named variables and functions with weird names it's just normal programming.
15
25
877
@_xjdr
xjdr
8 months
We've escaped containment
Tweet media one
19
19
860
@_xjdr
xjdr
4 months
please for the love of all that is holy stop RLHFing your models to produce summarized listicle slop. I am not writing a buzzfeeed article in 2012, i am trying to solve a distirbuted system consensus problem. this tendency and the inane safety hobbling are going to be what drives.
35
32
871
@_xjdr
xjdr
8 months
Due to compounding cofounder health issues, on Oct 1st my ASI lab officially turned down our final cluster and closed its doors. There are so many things we were working on that i wish i could have shared with y'all that may never see the light of day now. The future is bright.
67
15
858
@_xjdr
xjdr
3 months
lolol
Tweet media one
23
10
859
@_xjdr
xjdr
8 months
we are getting really close to something good here .
Tweet media one
40
27
840
@_xjdr
xjdr
5 months
12 days was probably too many.
28
6
840
@_xjdr
xjdr
11 months
If you've been following this account for very long, you'll know that i stopped finetuning ~8 months ago. Interesting DM paper supporting this. This is pretty much consistent with my findings except i've rarely needed 2048 examples to get good perf.
16
75
784
@_xjdr
xjdr
3 months
i have upgraded to 4 claude code sessions working in parallel in a single tmux session, each on their own feature branch and then another tmux window with yet another claude in charge of merging and resolving merge conflicts
51
30
794
@_xjdr
xjdr
11 months
when i started thinking of prompts as queries into a highly compressed (lossy) database, my frustration with open source LLMs mostly went away and my results got much better.
31
50
768
@_xjdr
xjdr
3 months
GPT 4.5 and DeepResearch have grown on me considerably but $200 / month is way too much for the value i am getting.
46
10
792
@_xjdr
xjdr
3 months
If they didn't at least pick up R1 and V3 and make it better with their fucking rockstar team and 150000 GPUs everyone should be fired on the spot and I will hear no more about it.
@tsarnick
Tsarathustra
3 months
Elon Musk says Grok 3 will be released in "a week or two" and it is "scary smart", displaying reasoning skills that outperform any other AI model that has been released
16
10
774
@_xjdr
xjdr
10 months
"I . worked on this for a year . and . he just . he tweeted it out. "
Tweet media one
18
17
749
@_xjdr
xjdr
6 months
whalebros cooked here. Not only does it seem to replicate the o1-preview results, it seems to pretty effectively replicate (at least parts of) the process. My guess is it uses something very similar to the lets verify step-by-step ORMs / PRMs to train and reward the the CoT in.
@deepseek_ai
DeepSeek
6 months
🚀 DeepSeek-R1-Lite-Preview is now live: unleashing supercharged reasoning power!. 🔍 o1-preview-level performance on AIME & MATH benchmarks. 💡 Transparent thought process in real-time. 🛠️ Open-source models & API coming soon!. 🌐 Try it now at #DeepSeek
Tweet media one
16
25
769
@_xjdr
xjdr
3 months
i am better than the AIs i use at python and C++ (i better be, i spent a lifetime practicing). When i am pair programing or collaborating with a model in those languages, i kind of just subconsciously gloss over the dumb things it does and correct them in my head but generally.
46
27
763
@_xjdr
xjdr
3 months
This is how i've been doing my cuda / ptx work for the last few weeks and i can both attest to R1 being particularly cracked at it AND that if you actually run a benchmark / compiler in the loop is does much better than you could possibly imagine. is this fast takeoff? almost.
@abacaj
anton
3 months
uh it might be over. they put r1 in a loop for 15minutes and it generated: "better than the optimized kernels developed by skilled engineers in some cases"
Tweet media one
17
34
762
@_xjdr
xjdr
3 months
wow. first few prompts with Sonnet 3.7 Extended (thinking edition) are insanely impressive. it is very clear that software development generally was a huge focus with this model. i need to do way more testing but if it continues to do what it just did . i will have much to say.
14
20
755
@_xjdr
xjdr
10 months
in my opinion, anyone that is really interested in AI should learn jax, not because its better than pytorch or whatever (although i think it is) but because it forces you to learn the a lot of the core math and functions deeply to be able to use it.
28
34
723
@_xjdr
xjdr
5 months
Llamas . Tokenizer Free?! USING ENTROPY STEERING?!?!! . sometimes the universe conspires to make a paper just for you and it feels wonderful when it happens.
@ArtidoroPagnoni
Artidoro Pagnoni
5 months
🚀 Introducing the Byte Latent Transformer (BLT) – An LLM architecture that scales better than Llama 3 using byte-patches instead of tokens 🤯 . Paper 📄 Code 🛠️
Tweet media one
12
38
718
@_xjdr
xjdr
7 months
This is potentially a very significant discovery for a lot of reasons. For now, it's safe to say that entropy based sampling and training techniques are shaping up to be unreasonably effective at combatting entropy collapse and hallucinations in current models.
@doomslide
doomslide
7 months
a few days ago @_xjdr and i discovered that each llm (the ones we tested at least) has a unique, stable entropy/varentropy characteristic which is reproducible from *entirely random* hidden state prompts.
Tweet media one
Tweet media two
Tweet media three
24
64
706
@_xjdr
xjdr
9 months
if you knew the level of AI that most frontier lab employees had access to, y'all would be furious. If y'all realized that you could more or less replicate it with open weights models if you knew enough and tired hard enough, you'd be even more furious . .
35
34
693
@_xjdr
xjdr
2 months
some things i know today i didn't know a month ago:. - new deepseek v3 is as good or better than sonnet 3.7. - B200 (SM100) PTX is a step function change and improvement over H100 (SM90). - DeepSeek absolutely cooked with FlashMLA and DeepGEMM but left a lot on the table. - y'all.
21
20
704
@_xjdr
xjdr
8 months
my 4090 died over night. I actually think i managed to work it to death. RIP king, you've served the cause with distinction and honor.
50
11
668
@_xjdr
xjdr
7 months
Nemotron-70B entropix edition is pretty fucking good
Tweet media one
35
26
667
@_xjdr
xjdr
3 months
TL;DR grok3 is fine and passes the vibe check of frontier level quality but its not better than R1 or o1-pro for me for most things i do. overall much better than i had expected, i put it in the gemini category but for me its still pretty far below the usefulness of R1 and the.
35
24
672
@_xjdr
xjdr
6 months
i dont even care if this is a psyop anymore. i am team whatever this is.
@Jiankui_He
Jiankui He
6 months
My new lab in Beijing.
Tweet media one
21
15
647
@_xjdr
xjdr
4 months
will make a formal announcement soon with all the sordid details, but we are actively hiring. the work will obviously involve model training, distributed inference, test time compute and RL post training etc. but it will also be so much more (and so much more ambitious). we are.
121
33
661
@_xjdr
xjdr
2 months
programming with AI is significantly more productive than programming without but it is so much harder. it tests and stresses all my skills every day. if you think its easier i have to imagine you are doing it wrong. tiger mom coding requires constant diligence and supervision.
51
39
652
@_xjdr
xjdr
7 months
new sonnet down bad
Tweet media one
17
18
631
@_xjdr
xjdr
7 months
partial list of things big labs use but do not share details about:.pretraining data composition.synthetic data generation pipelines.RL{AI,H}F objectives."Reasoning" .distillation.test time compute scaling. this is also probably a list of things you should be thinking about.
21
40
641
@_xjdr
xjdr
10 months
I just sat through Accenture's enterprise AI pitch. Lol. Lmao even.
38
9
617
@_xjdr
xjdr
11 months
watching @yacineMTB discover terence tao, zig and nvim out loud in real time is like watching a baby deer learn to walk. nature really is beautiful.
17
5
626
@_xjdr
xjdr
4 months
The next 6 months are pretty clear just like the last 6 months were if you were paying attention. Look for the words not spoken or some shit but getting left behind is 100% a skill issue now.
17
21
641
@_xjdr
xjdr
6 months
If deepseek ended up pulling this off using just RL + something similar to the DeepSeekv2-lite model in just a few months, the implications for almost all large labs might be pretty hard to overstate. If the eventual open source version can approximate these results at reasonable.
@deepseek_ai
DeepSeek
6 months
🌟 Impressive Results of DeepSeek-R1-Lite-Preview Across Benchmarks!
Tweet media one
15
36
629
@_xjdr
xjdr
4 months
Tweet media one
@markchen90
Mark Chen
4 months
Leadership is forged through fire.
6
15
627
@_xjdr
xjdr
8 months
i made a repo, its very naive as i wasn't planning on releasing this when i started. This does not have the new sampler yet, but i will add it once its stable. It has both the jax and pytorch implementations. If y'all want to make it better, submit PRs.
23
47
619
@_xjdr
xjdr
4 months
This is an `e=mc^2 + AI` level breakthrough. Please respond with wire instructions, will beat any offers.
@ZachWarunek
Zach Warunek
4 months
Would this work? LLMs basically eating all micro service bullshit
Tweet media one
13
5
616
@_xjdr
xjdr
8 months
We've done a lot of testing today. Based on our results, I did a bunch of back of napkin calcs and i have no idea how anyone besides google, meta OAI and Anthropic can afford to do research anymore. Just evals and inference is tens to hundreds of millions of dollars now . .
23
21
618
@_xjdr
xjdr
3 months
It would take a long ass article to articulate this properly but this is not a vageupoast. I have spent the last few months working on some very hard problems (more on that soon). I've been using a combination of R1 and DeepResearch to build and formalize the ideas and proofs.
@_xjdr
xjdr
3 months
as much as i am paying attention to AI each and every day, the future snuck up on me last night and this is Day 0 of a brand new world. i can confidently say that now.
33
22
628
@_xjdr
xjdr
8 months
Sorry for the sorry state of the entropix repo, i unexpectedly had to be heads down on some last min lab closure mop up work and was AFK. Now that i have some compute again (HUGE shout outs to @0xishand, @Yuchenj_UW and @evanjconrad) we're in the amazing position that we need.
32
27
605
@_xjdr
xjdr
5 months
if you will forgive a bit of anthropomorphizing, to the extent models think, they do not think in tokens. tokens are merely our samplers' best interpretation of the probability distributions that emerge from the actual 'thinking' process. they're an incredibly low bandwidth.
54
30
617
@_xjdr
xjdr
3 months
"Good morning Claude! Please take a look at the project board, the issues you've been assigned and the open PR's for this repo. Lets develop a plan to assign each of the relevant tasks to claude workers 1 - 5 and LETS GET TO WORK BUDDY!".
18
23
600
@_xjdr
xjdr
9 months
after weeks of intense labor in the eval gulags, i can confidently and unequivocally say that 405B instruct at BF16 (f32 softmax) with vanilla attention, scaled rope and best of N sampling (N at ~5) is the best available model for the things i do most often (code, agents, etc).
16
25
587
@_xjdr
xjdr
7 months
GM, looks like this morning's plans are now cancelled and we are throwing this into entropix.
20
21
572
@_xjdr
xjdr
5 months
lol, i just said "not your weights not your waifu" in a normal conversation. i think x has rotted my brain.
23
13
565
@_xjdr
xjdr
3 months
a bunch of people have been pinging me about R1 / Deep Research / Grok3. i have made a few observations after seeing their prompts:.- these models are bad at basically the same things.- these models are good at different and unique things.- most people dont have interesting.
14
16
569
@_xjdr
xjdr
8 months
wow, this is kind of wild to wake up to. Thank you to everyone who stared entropix!. Update: .There are so many things i want to try right now but i am limiting myself to working on adding more diverse prompts to the repo for testing and evals so we can move beyond vibes to
Tweet media one
13
14
565
@_xjdr
xjdr
3 months
if scale was really all you needed amazon and microsoft wouldn't need to use other people's models and google would be winning in every way.
@AnthropicAI
Anthropic
3 months
Claude will help power Amazon's next-generation AI assistant, Alexa+. Amazon and Anthropic have worked closely together over the past year, with @mikeyk leading a team that helped Amazon get the full benefits of Claude's capabilities.
Tweet media one
28
19
568
@_xjdr
xjdr
5 months
someone asked me how i think o3 works and i just responded with this
21
27
556
@_xjdr
xjdr
7 months
It's all coming together now. Slowly at first and then . .
@MLStreetTalk
Machine Learning Street Talk
1 year
"What's the Magic Word? A Control Theory of LLM Prompting"
Tweet media one
Tweet media two
Tweet media three
Tweet media four
11
23
544
@_xjdr
xjdr
6 months
How is he so good at posting? Also, who is taking these pictures?.
@Jiankui_He
Jiankui He
6 months
Japan will be the second country to allow gene editing before birth.
Tweet media one
26
1
530
@_xjdr
xjdr
10 months
- GDM is now leading the AGI race.- Llama3.1 changed everything and Llama4 is the most important model in the world right now in terms of potential impact (short of AGI has been achieved internally announcements) .- real talk, if with Noam can't make it on
Tweet media one
22
42
534
@_xjdr
xjdr
11 months
weekend reading list: .
10
46
533
@_xjdr
xjdr
7 months
no wasted movement. in all its 1B glory.
Tweet media one
13
10
532
@_xjdr
xjdr
1 month
i gave claude code an "ask gemini" and an "ask deepseek" MCP server. this should be interesting . .
@_xjdr
xjdr
1 month
i have decided to ask claude to prompt gemini for me. i am ashamed to say it is working out much better than my previous attempts.
17
17
540
@_xjdr
xjdr
8 months
to double down on this, the specific original goal was to see what we could accomplish with a vanilla OSS model without touching the weights or the architecture at all. This is a series of inference time compute experiments that essentially use the model outputs as as read only.
@kalomaze
kalomaze
8 months
@_xjdr is measuring the total variation in all the token choices per individual prediction and using that as a heuristic. you can actually visualize this measurement. this is not an architectural tweak, this is doing fancy state modification based off that
Tweet media one
14
18
529
@_xjdr
xjdr
8 months
2 independent early verifications seem to suggest . .
@basedjensen
Hensen Juang
8 months
lol huh did we accidentally just fix hallucination problem ?.
18
13
511
@_xjdr
xjdr
11 months
more people should try to write code in C+. Basically write C but use a C++ compiler, std::vector, std::shared_prt and real structs. 90% of the benefit of C++ (for most people) 10% of the complexity.
36
22
515
@_xjdr
xjdr
10 months
One of Jeff Dean's super powers is to be able to come up with reasonable approximations for very complex problems quickly. He also has the "latency numbers every engineer should know" that helped him reason about map reduce, search indexes, etc for this reason as well. Incredibly
Tweet media one
@JeffDean
Jeff Dean
10 months
It's really great to see the impact that TPUs have had and continue to have on Google's ability to do machine learning training and inference at scale, and to provide that same capability to @googlecloud customers via Cloud TPUs. Here's a bit of backstory on how they came to be.
Tweet media one
5
33
516
@_xjdr
xjdr
4 months
What a lot of y'all are feeling now is what I felt the first time I played with Llama 3 405B Base and basically shifted all my work and focus accordingly overnight. DeepSeeks trajectory has been very clear since the DeepSeek Math paper and the DeepSeekv2 model release.
10
13
514
@_xjdr
xjdr
6 months
i think its worth taking a moment to put into perspective how cool this work is. GPT2 is really what the entire OpenAI empire was built on / was deemed too dangerous to release a few short years ago and it is now reproducible in less than 8 min on a single (large) machine.
@kellerjordan0
Keller Jordan
6 months
New NanoGPT training speed record: 3.28 FineWeb val loss in 7.23 minutes on 8xH100. Previous record: 7.8 minutes.Changelog:.- Added U-net-like connectivity pattern.- Doubled learning rate. This record is by @brendanh0gan
Tweet media one
13
34
514
@_xjdr
xjdr
24 days
for those of you who may not understand why this prover v2 release is potentially so important (beyond solving hard math problems) its that this type of formalization is most likely the precursor / required to formalize things like code, etc at scale.
8
23
514
@_xjdr
xjdr
10 months
released prompt poet a library which provides not only a nice prompt templating system but also implements one of the more interesting parts of the original optimizing inference blog post, the cache aware prompt truncation for their tree style prompt cache
Tweet media one
Tweet media two
6
54
494
@_xjdr
xjdr
3 months
watching @qtnx_ being completely disgusted with open source research after 3 days at a real lab is both hilarious and a good reminder.
16
8
500
@_xjdr
xjdr
7 months
This is _so_ insane for a 1B model. Hensen cooked with middle out.
@basedjensen
Hensen Juang
7 months
This is pretty insane for 1b model
Tweet media one
11
11
487
@_xjdr
xjdr
7 months
Entropix Update:.I have deleted version 2 of the entropix-local refactor and have convinced myself that the 3rd time is the charm. i want to focus on user experience and UI with an emphasis on research and exploration. It should be dead simple for someone to clone the repo and.
29
28
489
@_xjdr
xjdr
8 months
hmmmm, i wonder what they did for the 100% .
Tweet media one
31
6
473
@_xjdr
xjdr
1 month
i have decided to ask claude to prompt gemini for me. i am ashamed to say it is working out much better than my previous attempts.
28
5
493
@_xjdr
xjdr
18 days
Gemini did learn how to use tools. my usage of claude outside of claude code was already virtually 0 but now it is going to be exactly 0.
@_xjdr
xjdr
18 days
gemini might have learned how to use tools . this could be a very very exciting development.
14
8
496
@_xjdr
xjdr
9 months
this is an awesome paper everyone should read.
20
43
482
@_xjdr
xjdr
3 months
hahahaha what?!?!."The test cluster comprised 25 storage nodes (2 NUMA domains/node, 1 storage service/NUMA, 2×400Gbps NICs/node) and 50 compute nodes (2 NUMA domains, 192 physical cores, 2.2 TiB RAM, and 1×200 Gbps NIC/node). Sorting 110.5 TiB of data across 8,192 partitions.
@deepseek_ai
DeepSeek
3 months
🚀 Day 5 of #OpenSourceWeek: 3FS, Thruster for All DeepSeek Data Access. Fire-Flyer File System (3FS) - a parallel file system that utilizes the full bandwidth of modern SSDs and RDMA networks. ⚡ 6.6 TiB/s aggregate read throughput in a 180-node cluster.⚡ 3.66 TiB/min.
12
15
494