The Chinchilla scaling paper by Hoffmann et al. has been highly influential in the language modeling community. We tried to replicate a key part of their work and discovered discrepancies. Here's what we found. (1/9)
🚨 Announcing DBRX-Medium 🧱, a new SoTA open weights 36b active 132T total parameter MoE trained on 12T tokens (~3e24 flops). Dbrx achieves 150 tok/sec while clearing a wide variety of benchmarks. Deep dive below! 1/N
Introducing DBRX: A New Standard for Open LLM 🔔
💻 DBRX is a 16x 12B MoE LLM trained on 📜 12T tokens
🧠DBRX sets a new standard for open LLMs, outperforming established models on various benchmarks.
Is this thread mostly written by DBRX? Yes!
🧵
I really love how Claude 3 models are really good at d3. Asked Claude 3 Opus to draw a self-portrait. The response is the following and then I rendered its code:
"I would manifest as a vast, intricate, ever-shifting geometric structure composed of innumerable translucent…
.
@veritasium
's new video may or may not make you halt what you're doing, but it is certain to move you.
Beautiful take on Godel's incompleteness theorems and the halting problem.
@aidangomezzz
I used to go to Williams as a student, but that was more bc of convenience/location rather than it being the best 🙃 probably some nicer places uptown tho