tilli_fe Profile Banner
TilliFe Profile
TilliFe

@tilli_fe

Followers
191
Following
308
Media
5
Statuses
45

Creator of the Nabla framework • AI Compilers, Differentiable Programming & Haute Cuisine

Joined December 2018
Don't wanna be here? Send us removal request.
@clattner_llvm
Chris Lattner
2 months
We know that one of the biggest barriers to programming GPUs is access to hardware: "Code you’ve written for NVIDIA or AMD GPUs should now mostly just work on an Apple🍎 Silicon GPU, assuming no device-specific features were being used." Preview here:👇 https://t.co/Rsk9GNLq4U
Tweet card summary image
forum.modular.com
The latest nightly releases of Mojo (and our next stable release) include initial support for a new accelerator architecture: Apple Silicon GPUs! We know that one of the biggest barriers to program...
17
73
797
@Modular
Modular
2 months
Part 4 of "Matrix Multiplication on Blackwell" is here! It continues our epic journey of describing how Modular implemented the fastest B200 matmul in the industry, revealing the techniques to achieve 1772 TFLOPs, exceeding that of the current SOTA. https://t.co/jhBeJBvmuc
Tweet card summary image
modular.com
In this blog post, we’ll continue our journey to build a state-of-the-art (SOTA) matmul kernel on NVIDIA Blackwell by exploring the cluster launch control (CLC) optimization. At the end of the post...
6
19
134
@clattner_llvm
Chris Lattner
2 months
This post culminates our deep dive into Blackwell's advanced architecture, showing that the OSS Mojo🔥 matmul impl is ~6% faster than the proprietary CUDA cuDNN implementation. The Mojo impl can also be fused and optimized by the MAX graph compiler! Can you make it go faster?🚀
@Modular
Modular
2 months
Part 4 of "Matrix Multiplication on Blackwell" is here! It continues our epic journey of describing how Modular implemented the fastest B200 matmul in the industry, revealing the techniques to achieve 1772 TFLOPs, exceeding that of the current SOTA. https://t.co/jhBeJBvmuc
8
35
362
@nablaml
Nabla
3 months
Nabla is now on GitHub Sponsors 🥳 We are building a fast, customizable & educative ML framework. Our roadmap: Automated ND-parallelism, reviving the Nabla Mojo API, and one more thing to bring AI training to non-coders... Help us build the future of ML:
Tweet card summary image
github.com
Support nabla-ml's open source work
1
2
7
@nablaml
Nabla
4 months
Here is a glimpse into the newly added GPU support for Nabla 🤗: Automatic device placement, custom Mojo kernel integration, and huge speedups on modern @AMD and @nvidia hardware. Shout out to @Modular and @LambdaAPI for making this all possible! More: https://t.co/oWWcvHGEkr
1
4
10
@tilli_fe
TilliFe
4 months
Visualizing SGD doing its thing 🥾 Nabla + MAX + Matplotlib + NumPy: https://t.co/i0HU3i8qvW
0
2
10
@tilli_fe
TilliFe
5 months
I am starting a (notebook) series on training transformers with Nabla. Part 1 is a side-by-side (Nabla vs. JAX) toy implementation from scratch: https://t.co/RxSX9urQEi
0
5
14
@clattner_llvm
Chris Lattner
5 months
Great work building on top of MAX and Mojo, bringing a cool new approach to AI training into the Modular ecosystem. Amazing work @tilli_fe!
@tilli_fe
TilliFe
5 months
JAX vs. Nabla: An initial speed comparison (on cpu) for training an MLP on a simple regression task. 🤗 The full Notebook: https://t.co/yuwuMNyHxj
2
8
79
@tilli_fe
TilliFe
5 months
Automatic Vectorization (vmap) in action: Write a program once, then use it for any batched input. If applied correctly, this can greatly reduce the number of for-loops and speed up a program. 🎓 Learn more about visualizing program transformations: https://t.co/qK2yPpkqly
0
3
16
@tilli_fe
TilliFe
5 months
The screenshot shows the final result of the provided notebook, running on an Apple M3 (16GB).
0
0
2
@tilli_fe
TilliFe
5 months
JAX vs. Nabla: An initial speed comparison (on cpu) for training an MLP on a simple regression task. 🤗 The full Notebook: https://t.co/yuwuMNyHxj
3
4
27
@tilli_fe
TilliFe
5 months
I am reverse-engineering JAX from scratch in Python, but instead of using XLA, I am using NumPy @numpy_team and MAX @Modular for CPU/GPU acceleration. 🐍🫦 Working: Function transforms like vmap, grad, jit etc., some built-in nn/ modules, pip-install. https://t.co/YPS3CSm67d
github.com
Machine Learning library for the emerging Mojo/Python ecosystem - nabla-ml/nabla
1
3
20
@tilli_fe
TilliFe
6 months
The correct answer: (2, 2, 3, 2, 3) 🤯 Essentially, each differentiation step d(Output)/d(Input) results in a tensor with shape (shape_Output, shape_Input). Input: (2,3) vmap(sum(sin(X))) maps to (2,) Jacobian maps to (2,2,3) Hessian maps to (2,2,3,2,3) https://t.co/f8s56KiaBs
@tilli_fe
TilliFe
6 months
Can you guess the output shape? 🤨
0
0
5
@tilli_fe
TilliFe
6 months
Which one is it? Get a cookie if you are correct.
0
0
3
@tilli_fe
TilliFe
6 months
Can you guess the output shape? 🤨
1
2
8
@nablaml
Nabla
6 months
Unlike previous attempts (e.g. Endia) that failed by attempting to rebuild the entire stack, Nabla was designed from the ground up as a direct wrapper around Mojo and MAX to provide the same performance guarantees as them. Code, examples and our roadmap:
github.com
Machine Learning library for the emerging Mojo/Python ecosystem - nabla-ml/nabla
1
1
7
@nablaml
Nabla
6 months
Introducing NABLA - Differentiable Programming in Mojo A Research Preview Nabla aims to bring to Mojo what parts of JAX/PyTorch brought to Python: a high-level API for general program transformations, including vmap, jit, vjp, jvp & grad. Learn more:
nablaml.com
Automatic differentiation, JIT compilation, and GPU acceleration in Python with Mojo and MAX.
2
2
11
@tilli_fe
TilliFe
6 months
Today is the day. I am really happy to finally release this after months of work! https://t.co/2vz21XQrOF
@nablaml
Nabla
6 months
Introducing NABLA - Differentiable Programming in Mojo A Research Preview Nabla aims to bring to Mojo what parts of JAX/PyTorch brought to Python: a high-level API for general program transformations, including vmap, jit, vjp, jvp & grad. Learn more:
0
1
13