TilliFe
@tilli_fe
Followers
191
Following
308
Media
5
Statuses
45
Creator of the Nabla framework • AI Compilers, Differentiable Programming & Haute Cuisine
Joined December 2018
We know that one of the biggest barriers to programming GPUs is access to hardware: "Code you’ve written for NVIDIA or AMD GPUs should now mostly just work on an Apple🍎 Silicon GPU, assuming no device-specific features were being used." Preview here:👇 https://t.co/Rsk9GNLq4U
forum.modular.com
The latest nightly releases of Mojo (and our next stable release) include initial support for a new accelerator architecture: Apple Silicon GPUs! We know that one of the biggest barriers to program...
17
73
797
Part 4 of "Matrix Multiplication on Blackwell" is here! It continues our epic journey of describing how Modular implemented the fastest B200 matmul in the industry, revealing the techniques to achieve 1772 TFLOPs, exceeding that of the current SOTA. https://t.co/jhBeJBvmuc
modular.com
In this blog post, we’ll continue our journey to build a state-of-the-art (SOTA) matmul kernel on NVIDIA Blackwell by exploring the cluster launch control (CLC) optimization. At the end of the post...
6
19
134
This post culminates our deep dive into Blackwell's advanced architecture, showing that the OSS Mojo🔥 matmul impl is ~6% faster than the proprietary CUDA cuDNN implementation. The Mojo impl can also be fused and optimized by the MAX graph compiler! Can you make it go faster?🚀
Part 4 of "Matrix Multiplication on Blackwell" is here! It continues our epic journey of describing how Modular implemented the fastest B200 matmul in the industry, revealing the techniques to achieve 1772 TFLOPs, exceeding that of the current SOTA. https://t.co/jhBeJBvmuc
8
35
362
Learn more about the project:
nablaml.com
Automatic differentiation, JIT compilation, and GPU acceleration in Python with Mojo and MAX.
0
2
4
Nabla is now on GitHub Sponsors 🥳 We are building a fast, customizable & educative ML framework. Our roadmap: Automated ND-parallelism, reviving the Nabla Mojo API, and one more thing to bring AI training to non-coders... Help us build the future of ML:
github.com
Support nabla-ml's open source work
1
2
7
Here is a glimpse into the newly added GPU support for Nabla 🤗: Automatic device placement, custom Mojo kernel integration, and huge speedups on modern @AMD and @nvidia hardware. Shout out to @Modular and @LambdaAPI for making this all possible! More: https://t.co/oWWcvHGEkr
1
4
10
I am starting a (notebook) series on training transformers with Nabla. Part 1 is a side-by-side (Nabla vs. JAX) toy implementation from scratch: https://t.co/RxSX9urQEi
0
5
14
Great work building on top of MAX and Mojo, bringing a cool new approach to AI training into the Modular ecosystem. Amazing work @tilli_fe!
JAX vs. Nabla: An initial speed comparison (on cpu) for training an MLP on a simple regression task. 🤗 The full Notebook: https://t.co/yuwuMNyHxj
2
8
79
Automatic Vectorization (vmap) in action: Write a program once, then use it for any batched input. If applied correctly, this can greatly reduce the number of for-loops and speed up a program. 🎓 Learn more about visualizing program transformations: https://t.co/qK2yPpkqly
0
3
16
The screenshot shows the final result of the provided notebook, running on an Apple M3 (16GB).
0
0
2
JAX vs. Nabla: An initial speed comparison (on cpu) for training an MLP on a simple regression task. 🤗 The full Notebook: https://t.co/yuwuMNyHxj
3
4
27
I am reverse-engineering JAX from scratch in Python, but instead of using XLA, I am using NumPy @numpy_team and MAX @Modular for CPU/GPU acceleration. 🐍🫦 Working: Function transforms like vmap, grad, jit etc., some built-in nn/ modules, pip-install. https://t.co/YPS3CSm67d
github.com
Machine Learning library for the emerging Mojo/Python ecosystem - nabla-ml/nabla
1
3
20
The correct answer: (2, 2, 3, 2, 3) 🤯 Essentially, each differentiation step d(Output)/d(Input) results in a tensor with shape (shape_Output, shape_Input). Input: (2,3) vmap(sum(sin(X))) maps to (2,) Jacobian maps to (2,2,3) Hessian maps to (2,2,3,2,3) https://t.co/f8s56KiaBs
0
0
5
say hello to the modular forum to see many mojo-related discussions. https://t.co/MEwpY9xLMq
forum.modular.com
Today we are releasing a research preview of NABLA - a framework for differentiable programming in Mojo. Nabla aims to bring to Mojo what parts of JAX and PyTorch brought to Python: a high-level API...
0
1
4
Unlike previous attempts (e.g. Endia) that failed by attempting to rebuild the entire stack, Nabla was designed from the ground up as a direct wrapper around Mojo and MAX to provide the same performance guarantees as them. Code, examples and our roadmap:
github.com
Machine Learning library for the emerging Mojo/Python ecosystem - nabla-ml/nabla
1
1
7
Introducing NABLA - Differentiable Programming in Mojo A Research Preview Nabla aims to bring to Mojo what parts of JAX/PyTorch brought to Python: a high-level API for general program transformations, including vmap, jit, vjp, jvp & grad. Learn more:
nablaml.com
Automatic differentiation, JIT compilation, and GPU acceleration in Python with Mojo and MAX.
2
2
11
Today is the day. I am really happy to finally release this after months of work! https://t.co/2vz21XQrOF
Introducing NABLA - Differentiable Programming in Mojo A Research Preview Nabla aims to bring to Mojo what parts of JAX/PyTorch brought to Python: a high-level API for general program transformations, including vmap, jit, vjp, jvp & grad. Learn more:
0
1
13