TilliFe @tilli_fe X Profile

TilliFe

@tilli_fe

Followers

191

Following

308

Media

5

Statuses

45

Creator of the Nabla framework • AI Compilers, Differentiable Programming & Haute Cuisine

Joined December 2018

Don't wanna be here? Send us removal request.

Chris Lattner

@clattner_llvm

2 months

We know that one of the biggest barriers to programming GPUs is access to hardware: "Code you’ve written for NVIDIA or AMD GPUs should now mostly just work on an Apple🍎 Silicon GPU, assuming no device-specific features were being used." Preview here:👇 https://t.co/Rsk9GNLq4U

forum.modular.com

The latest nightly releases of Mojo (and our next stable release) include initial support for a new accelerator architecture: Apple Silicon GPUs! We know that one of the biggest barriers to program...

17

73

797

Modular

@Modular

2 months

Part 4 of "Matrix Multiplication on Blackwell" is here! It continues our epic journey of describing how Modular implemented the fastest B200 matmul in the industry, revealing the techniques to achieve 1772 TFLOPs, exceeding that of the current SOTA. https://t.co/jhBeJBvmuc

modular.com

In this blog post, we’ll continue our journey to build a state-of-the-art (SOTA) matmul kernel on NVIDIA Blackwell by exploring the cluster launch control (CLC) optimization. At the end of the post...

6

19

134

Chris Lattner

@clattner_llvm

2 months

This post culminates our deep dive into Blackwell's advanced architecture, showing that the OSS Mojo🔥 matmul impl is ~6% faster than the proprietary CUDA cuDNN implementation. The Mojo impl can also be fused and optimized by the MAX graph compiler! Can you make it go faster?🚀

Modular

@Modular

2 months

Part 4 of "Matrix Multiplication on Blackwell" is here! It continues our epic journey of describing how Modular implemented the fastest B200 matmul in the industry, revealing the techniques to achieve 1772 TFLOPs, exceeding that of the current SOTA. https://t.co/jhBeJBvmuc

8

35

362

Nabla

@nablaml

3 months

Learn more about the project:

nablaml.com

Automatic differentiation, JIT compilation, and GPU acceleration in Python with Mojo and MAX.

0

2

4

Nabla

@nablaml

3 months

Nabla is now on GitHub Sponsors 🥳 We are building a fast, customizable & educative ML framework. Our roadmap: Automated ND-parallelism, reviving the Nabla Mojo API, and one more thing to bring AI training to non-coders... Help us build the future of ML:

github.com

Support nabla-ml's open source work

1

2

7

Nabla

@nablaml

4 months

Here is a glimpse into the newly added GPU support for Nabla 🤗: Automatic device placement, custom Mojo kernel integration, and huge speedups on modern @AMD and @nvidia hardware. Shout out to @Modular and @LambdaAPI for making this all possible! More: https://t.co/oWWcvHGEkr

1

4

10

TilliFe

@tilli_fe

4 months

Visualizing SGD doing its thing 🥾 Nabla + MAX + Matplotlib + NumPy: https://t.co/i0HU3i8qvW

0

2

10

TilliFe

@tilli_fe

5 months

I am starting a (notebook) series on training transformers with Nabla. Part 1 is a side-by-side (Nabla vs. JAX) toy implementation from scratch: https://t.co/RxSX9urQEi

0

5

14

Chris Lattner

@clattner_llvm

5 months

Great work building on top of MAX and Mojo, bringing a cool new approach to AI training into the Modular ecosystem. Amazing work @tilli_fe!

TilliFe

@tilli_fe

5 months

JAX vs. Nabla: An initial speed comparison (on cpu) for training an MLP on a simple regression task. 🤗 The full Notebook: https://t.co/yuwuMNyHxj

2

8

79

TilliFe

@tilli_fe

5 months

Automatic Vectorization (vmap) in action: Write a program once, then use it for any batched input. If applied correctly, this can greatly reduce the number of for-loops and speed up a program. 🎓 Learn more about visualizing program transformations: https://t.co/qK2yPpkqly

0

3

16

TilliFe

@tilli_fe

5 months

The screenshot shows the final result of the provided notebook, running on an Apple M3 (16GB).

0

2

TilliFe

@tilli_fe

5 months

JAX vs. Nabla: An initial speed comparison (on cpu) for training an MLP on a simple regression task. 🤗 The full Notebook: https://t.co/yuwuMNyHxj

3

4

27

TilliFe

@tilli_fe

5 months

I am reverse-engineering JAX from scratch in Python, but instead of using XLA, I am using NumPy @numpy_team and MAX @Modular for CPU/GPU acceleration. 🐍🫦 Working: Function transforms like vmap, grad, jit etc., some built-in nn/ modules, pip-install. https://t.co/YPS3CSm67d

github.com

Machine Learning library for the emerging Mojo/Python ecosystem - nabla-ml/nabla

1

3

20

TilliFe

@tilli_fe

6 months

The correct answer: (2, 2, 3, 2, 3) 🤯 Essentially, each differentiation step d(Output)/d(Input) results in a tensor with shape (shape_Output, shape_Input). Input: (2,3) vmap(sum(sin(X))) maps to (2,) Jacobian maps to (2,2,3) Hessian maps to (2,2,3,2,3) https://t.co/f8s56KiaBs

TilliFe

@tilli_fe

6 months

Can you guess the output shape? 🤨

0

5

TilliFe

@tilli_fe

6 months

Which one is it? Get a cookie if you are correct.

0

3

TilliFe

@tilli_fe

6 months

Can you guess the output shape? 🤨

1

2

8

Nabla

@nablaml

6 months

say hello to the modular forum to see many mojo-related discussions. https://t.co/MEwpY9xLMq

forum.modular.com

Today we are releasing a research preview of NABLA - a framework for differentiable programming in Mojo. Nabla aims to bring to Mojo what parts of JAX and PyTorch brought to Python: a high-level API...

0

1

4

Nabla

@nablaml

6 months

Unlike previous attempts (e.g. Endia) that failed by attempting to rebuild the entire stack, Nabla was designed from the ground up as a direct wrapper around Mojo and MAX to provide the same performance guarantees as them. Code, examples and our roadmap:

github.com

Machine Learning library for the emerging Mojo/Python ecosystem - nabla-ml/nabla

1

7

Nabla

@nablaml

6 months

Introducing NABLA - Differentiable Programming in Mojo A Research Preview Nabla aims to bring to Mojo what parts of JAX/PyTorch brought to Python: a high-level API for general program transformations, including vmap, jit, vjp, jvp & grad. Learn more:

nablaml.com

Automatic differentiation, JIT compilation, and GPU acceleration in Python with Mojo and MAX.

2

11

TilliFe

@tilli_fe

6 months

Today is the day. I am really happy to finally release this after months of work! https://t.co/2vz21XQrOF

Nabla

@nablaml

6 months

Introducing NABLA - Differentiable Programming in Mojo A Research Preview Nabla aims to bring to Mojo what parts of JAX/PyTorch brought to Python: a high-level API for general program transformations, including vmap, jit, vjp, jvp & grad. Learn more:

0

1

13