#kernels X Hashtag | Muskviewer

Explore tweets tagged as #kernels

Ashutosh Maheshwari

@asmah2107

12 days

Inference optimizations I’d study if I wanted sub-second LLM responses: Bookmark this. 1.KV-Caching 2.Speculative Decoding 3.FlashAttention 4.PagedAttention 5.Batch Inference 6.Early Exit Decoding 7.Parallel Decoding 8.Mixed Precision Inference 9.Quantized Kernels 10.Tensor

18

261

2K

Dan Alistarh

@DAlistarh

13 days

🚀 We are releasing state-of-the-art post-training quantization (PTQ) algorithms for Microscaling FP4, together with kernels: - First study focused on MXFP4/NVFP4 PTQ for LLMs - New Micro-Rotated (MR) format and GPTQ algorithm - QuTLASS GPU kernels with up to 3.6x speedups.

1

28

150

Jino Rohit

@jino_rohit

15 days

I'm building my own OpenCV from scratch - fastcv. fastcv is a C++ CUDA rewrite with Pytorch bindings of the image filters in the OpenCV library. I have already written two optimized kernels and will keep studying and implementing more. I have also added current benchmarks.

38

58

1K

alexine 🏴‍☠️

@alexinexxx

5 days

good morning >first day of unemployment >gpu programming seems so cool >wrote my first kernels

112

90

5K

Wendy🌸

@akinyi__wendy

7 days

Celebrating 500 downloads🤩🎉 The earliest preprints are now available on Zenodo . Spiral kernels : https://t.co/UPHQ4YdoU3 Sound localization: https://t.co/T1bAzaUsC8 Ultra lightweight QCNN : https://t.co/IKVDFxwCR9 SKCNN for DNA classification: https://t.co/8qTYs4W77j

6

2

34

Hashmitha

@Physicla_

10 days

Morning grind ☆ > Read a research paper (ML+Open Quantum systems) > Did theory part of SVM, QSVM >Implemented SVM on Iris dataset using different kernels > Implemented QSVM from scratch using numpy on synthetic dataset >Implemented QSVM using pennylane

5

194

skymeds store | Online Ivermectin Pharmacy

@skymeds_store

12 days

Cured from stage 4 metastatic cancer in the breast, lungs, stomach, ovaries & bones with soursop along with apricot kernels and black seed oil. Told she was going to die but is now cancer free. /1

27

343

1K

Rishabh Anand

@rishabh16_

14 days

This is a super cool topic! I did a class project with a friend for a course on Kernels during senior year in college 🔗: https://t.co/FWmUOdbJjc Lots of fun connections between kernels and self-attention, especially when learning periodic functions The attention patterns

Peyman Milanfar

@docmilanfar

15 days

How Kernel Regression is related to Attention Mechanism - a summary in 10 slides. 0/1

0

55

446

λux

@novasarc01

10 days

FlashInfer redefines how attention kernels, kv-cache layouts and dynamic runtimes are compiled and scheduled for efficient LLM serving. Check out my latest blog "Dissecting FlashInfer - A Systems Perspective on High-Performance LLM Inference".

9

43

399

Jino Rohit

@jino_rohit

9 days

Back to writing cuda kernels again!

1

59

Mitchell Kernels

@MitchellKernels

14 hours

Marching Band Competition in Omaha, Nebraska

0

1

Elliot Arledge (h/eng)

@elliotarledge

1 day

in the lectures below, i hold your hand through low-level LLM systems engineering. it includes everything up to TODAY! 1) pytorch tensors 2) large matmul on cpu vs gpu 3) JAX (and why xAI uses it instead of pytorch) 4) raw cuda kernels and global threading indexing 5) triton

30

76

800

Kelly || UGC Creator

@UGC_KellyJ

1 day

For a product like popcorn where people are going to ask WHY would I pay way more...you gotta show them how it's actually different (unique flavors, wrapped kernels) & that it's SO worth it (aka, it's freaking delicious). Throw in an innuendo hook and you're good to roll 😅 #ugc

8

0

17

Jino Rohit

@jino_rohit

13 days

fastcv sobel edge detector kernel is 𝟭𝟮𝟬𝟬𝘅 faster than opencv for 4k images. more benchmarks (RTX 4060 Ti and 4k images) 1. blur kernel is 𝟰𝘅 faster than opencv 2. grayscale kernel is 𝟭𝟮𝘅 faster than opencv for 4k images. Writing more kernels every single day!

Jino Rohit

@jino_rohit

15 days

I'm building my own OpenCV from scratch - fastcv. fastcv is a C++ CUDA rewrite with Pytorch bindings of the image filters in the OpenCV library. I have already written two optimized kernels and will keep studying and implementing more. I have also added current benchmarks.

18

32

386

Ashutosh Maheshwari

@asmah2107

9 days

Optimizations I’d study if I wanted real-time Gen AI. Bookmark this. 1.Streaming Generation 2.Token Parallelism 3.Prefetch Pipelines 4.CUDA Graphs 5.Speculative Decoding 6.PagedAttention 7.KV Cache Quantization 8.Dynamic Batching 9.FP8 Kernels 10.Asynchronous Prefill 11.Memory

8

90

589

Akshay 🚀

@akshay_pachaar

4 days

If anyone needs a video guide to Karpathy's nanochat, check out Stanford's CS336! It covers: - Tokenization - Resource Accounting - Pretraining - Finetuning (SFT/RLHF) - Overview of Key Architectures - Working with GPUs - Kernels and Tritons - Parallelism - Scaling Laws -

12

281

2K

Kernels

@KernelsAI

4 days

Excited to attend the Aptos Experience with @rockyntheblock as an Ambassador for 🏝️ Resort Maker on @Aptos ✈️🗽 Here for RWAs, data, and some good time in NYC? Let’s connect :) Catch you at the conference ! @KernelsAI @exponentlabshq

1

2

10

Radical Numerics

@RadicalNumerics

9 days

We’re also hiring aggressively. Reach out if you’re interested in building automated research environments and agents. (AI researchers and SWEs, pre/mid/post training, architecture design, kernels, lots of backend system design, and automation) Our team is behind the tech for

16

5

110

Trevor3DPrints

@Trevor3DPrints

7 days

Inside info on Elegoo! Introducing the Pop-n-Print 3000! 🚀 Multi-color 3D printing is now as easy as movie night. Load the spools & kernels, hit start, & let this high-tech marvel handle the print AND the popcorn. Kick back, game on, & enjoy simultaneous 3D printing & snacking!

0

3