Graham Markall @gmarkall X Profile

Graham Markall

@gmarkall

Followers

1K

Following

941

Media

188

Statuses

4K

Professional interests: Python, CUDA, @numba_jit. Personal interests: RISC-V, PSXDev, OSHW, 日本語. Fun: family, cycling, running. Also @[email protected]

Lincoln, UK

Joined May 2010

Don't wanna be here? Send us removal request.

Graham Markall

@gmarkall

2 years

Apparently it's #MyTwitterAnniversary, so I'll use it as an opportunity to link to the Mastodon account I now use:

0

3

Graham Markall

@gmarkall

3 years

Quick reminder that I'm mainly t[oo|wee]ting on Mastodon these days:

mastodon.social

384 Posts, 389 Following, 302 Followers · Professional interests: Python, CUDA, Compilers. Personal interests: RISC-V, PSXDev, OSHW, 日本語. Recreation: family time, cycling, running, cooking.

0

3

Graham Markall

@gmarkall

3 years

Me on Mastodon: I don't plan to disable this account, but I'm generally posting there these days.

mastodon.social

384 Posts, 389 Following, 302 Followers · Professional interests: Python, CUDA, Compilers. Personal interests: RISC-V, PSXDev, OSHW, 日本語. Recreation: family time, cycling, running, cooking.

0

1

2

Graham Markall

@gmarkall

3 years

Do you know about automatic differentiation and / or Enzyme? Could you help sketch out how AD support for @numba_jit could be implemented? Issue / thread:

github.com

Feature request It would be great if Numba supported automatic differentiation. Maybe using Enzyme would be the easiest way as it operates directly on the IR of LLVM. Another possible source of ins...

2

5

Graham Markall

@gmarkall

3 years

I've been on Mastodon a while but I got out of the habit of using it - this is my account that I'm starting to get active with again: (is this usually written as @gmarkall @mastodon.social?). It has an old profile pic, still need to upload my current one.

mastodon.social

384 Posts, 389 Following, 302 Followers · Professional interests: Python, CUDA, Compilers. Personal interests: RISC-V, PSXDev, OSHW, 日本語. Recreation: family time, cycling, running, cooking.

0

2

Graham Markall

@gmarkall

3 years

@anthonypjshaw 6. Finally, maybe this should have been a blog post with some code! Would that be interesting? What else should be answered / elaborated on if I write this up?.

2

0

7

Graham Markall

@gmarkall

3 years

5. "CPython Internals" by @anthonypjshaw is a great intro / reference to CPython, saved me a lot of time, and got me up to speed quickly for this endeavour. Highly recommended if you want to poke about with this sort of stuff! (.

realpython.com

Unlock the inner workings of the Python language, compile the Python interpreter from source code, and participate in the development of CPython. The "CPython Internals" book shows you exactly how.

1

0

2

Graham Markall

@gmarkall

3 years

4. Performance is not good for my naive implementation - literally every Python alloc results in a CUDA driver API call, instead of allocating arenas and doing other clever stuff that the Python allocators do. Though, that can be solved, maybe reusing a lot of what CPython does.

1

0

1

Graham Markall

@gmarkall

3 years

3. Context management a bit fiddly - e.g. if the allocator is the first use of CUDA it needs to cuInit and set up context, but then other libraries like Numba can be unhappy. Also other libraries switching the context (such as Numba again) needs to be handled gracefully. .

1

0

2

Graham Markall

@gmarkall

3 years

2. Using the CUDA Memory Management APIs (cuMemAllocManaged etc) was straightforward to implement the required methods (malloc, calloc, realloc, free). cuMemGetAddressRange was convenient for realloc. I used the driver API, to keep things simple (for me):

1

0

1

Graham Markall

@gmarkall

3 years

1. The PyMem_SetAllocator API ( perhaps doesn't support as broad a range of use cases as I'd like - perhaps intended mainly to support tracing and debugging? . I'm not sure I can turn off my allocator once it's on - I see no way to migrate allocations. .

docs.python.org

Overview: Memory management in Python involves a private heap containing all Python objects and data structures. The management of this private heap is ensured internally by the Python memory manag...

1

0

1

Graham Markall

@gmarkall

3 years

I've been experimenting with using CUDA Unified Memory for the Python heap, towards a general goal of experimenting with a more unified CPU / GPU execution model within CPython, implemented as PyMemAllocatorEx instances. I have a few thoughts so far. .

1

0

6

Graham Markall

@gmarkall

3 years

RT @RAPIDSai: Do more on GPUs with less code - check out @gmarkall's new blog on the @numba_jit high-level API

0

10

0

Graham Markall

@gmarkall

3 years

RT @numba_jit: Public service announcement: Yes, the Numba team is aware of the Python 3.11 release. 💥 Yes, we are working on it. 💪Please….

0

3

0

Graham Markall

@gmarkall

3 years

Lil identify xei9pa2AeQu6.

Jeff Geerling

@geerlingguy

3 years

your rap name is "lil" + the last message you sent in IRC.

0

2

Graham Markall

@gmarkall

3 years

RT @__mharrison__: Do you use the GPU in Python? If so, for what, and what library do you use? 🤔.

0

22

0

Graham Markall

@gmarkall

3 years

"Use a font that is way too small" - every corporate Powerpoint template for technical talks ever.

0

1

Graham Markall

@gmarkall

3 years

With @numba_jit 0.56 you can use the High-level API to extend the CUDA target. Much simpler to use than the low-level API, this notebook shows how to use it for some quick examples: HLA docs: Pic: using it to implement clock64()

1

2

12

Graham Markall

@gmarkall

3 years

Money-saving tip: instead of buying an expensive chef's knife, simply take a PCIe slot blanking plate out of a cheap PC case.

0

22