Matt
@matt_dz
Followers
5K
Following
54K
Media
568
Statuses
7K
C++, Compilers, Computer Architecture, Generic Programming, GPGPU, HPC, Machine Learning, Numerics, Parallel Computing, Quantitative Finance
Joined November 2010
Computer Architecture, C++, and High Performance - Meeting C++ 2016 https://t.co/yDPCRr7YWf
https://t.co/QpzBdHzbtI
https://t.co/J14zg1QbTj
github.com
A categorized list of C++ resources. Contribute to MattPD/cpplinks development by creating an account on GitHub.
Computer Architecture, C++, and High Performance - Matt P. Dziubinski - Meeting C++ 2016 https://t.co/X65Sr3IJ9V
#cpp
#cplusplus
1
24
102
From Daan Leijen and @anton_lorenzen: an extended version of their POPL 2023 paper, "Tail Recursion Modulo Context: An Equational Approach", which calculates a generalisation of the "tail recursion modulo cons" optimisation from its specification. https://t.co/2caVArbQoU
cambridge.org
Tail recursion modulo context: An equational approach (extended version) - Volume 35
0
5
11
Xavier Leroy’s new book “ Control structures in programming languages: from goto to algebraic effects” https://t.co/EXcds9inhg
2
77
361
New blog post: Machine Scheduler in LLVM - Part II https://t.co/MmpteDtbPb
myhsu.xyz
In the first part of this series, we covered the basic workflow of Machine Scheduler – LLVM’s predominated instruction scheduling framework – and learned that an instruction could go through three...
0
13
39
Opportunistically Parallel Lambda Calculus https://t.co/RSkoZKt0V5
https://t.co/De6syqWsuq OOPSLA 2025 Stephen Mell (@stephenlmell), Konstantinos Kallas (@KonsKallas), Steve Zdancewic, Osbert Bastani
0
1
10
I'm curious if compiler Twitter has a take on this. Let's imagine you are trying to design an IR for a frontend language that has mutation and aliasing, where you expect advanced *users* to be writing compiler passes. It's probably a bad idea to have your IR have both mutation
9
3
59
New article! https://t.co/CvpNnFkmWr "The Impossible Optimization, and the Metaprogramming To Achieve It" TL;DR: If you warp your mind a bit, you can apply metaprogramming to speed up your code by about ~10x. Enjoy!
4
1
18
A small thread about how you should be drawing the contents of higher dimensional tensors
6
25
299
Wrote a 1-year retrospective with @a1zhang on KernelBench and the journey toward automated GPU/CUDA kernel generations! Since my labmates (@anneouyang, @simran_s_arora, @_williamhu) and I first started working towards this vision around last year’s @GPU_mode hackathon, we have
11
64
290
"How NOT To Program an Out-of-order Vector Processor" slides are public. https://t.co/0zYHoUP3l5
1
8
67
Congratulations to Vu Le, @Chengnian& @zhendongsu on receiving the Most Influential OOPSLA Paper Award at #SPLASH2025 for their OOPSLA'15 paper "Finding Deep Compiler Bugs via Guided Stochastic Program Mutation"! 📽️Award presentation: https://t.co/zanFafAe1i
@CSatETH @splashcon
0
8
24
0
3
13
Great work! This kind of interoperability will help unlock new cross-compiler optimizations to push kernel performance to the extreme.
📢Excited to introduce Apache TVM FFI, an open ABI and FFI for ML systems, enabling compilers, libraries, DSLs, and frameworks to naturally interop with each other. Ship one library across pytorch, jax, cupy etc and runnable across python, c++, rust https://t.co/m2gHJRreol
0
8
27
I gave a talk a few days ago at REBASE about the work the Verse engineering team is doing to implement a new VM for Verse and a software transactional memory runtime and compiler for C++.
0
2
10
Read the latest updates on the #Clang bytecode interpreter. Discover how 500+ commits have made the implementation more solid, reduced test failures, and improved performance for compile-time constant evaluation. https://t.co/FCp7KbBkCH
developers.redhat.com
It’s October again, so let me tell you what happened with the clang bytecode interpreter this year. In case this is the first you've encountered this topic: This is a project for a bytecode
0
2
14
I wrote a blog post about fast call-stack backtracing. Hopefully, someone making an intrusive profiler, memory tracker, or logger will find it useful... https://t.co/Lb6j5WYkjr
4
45
247
"Notes About Nvidia GPU Shared Memory Banks" from @axel_s_feldmann
https://t.co/jkrDT5EuBQ
feldmann.nyc
0
2
8
You can find our paper here: https://t.co/Y1O2P3Jhwh The online demo is here: https://t.co/9TaQrWayko And if you can't attend, you have the chance to watch it live:
1
2
4
Should a good parallel language design be minimal in nature? Read Brad Chamberlain's take on this question in this month's edition of his "10 Myths About Scalable Parallel Programming Languages" blog series. https://t.co/8IqPkvTa2i
0
1
1