Adam P. Goucher
@apgox
Followers
1K
Following
6K
Media
204
Statuses
3K
Algorithmist
Cambridge
Joined September 2014
We just published "Hash-based signatures for Bitcoin," a new analysis of post-quantum schemes by @kudinov_mikhail and myself at @blksresearch. This paper serves as a gentle intro to hash-based schemes and explores how to optimize them specifically for application in Bitcoin. 🧵
49
253
1K
I worked with Sam extensively at DeepMind (for example on arxiv . org / abs / 2105.13343 = multiple augmentations of the same data point in the batch => better & faster training) and this seems like an amazing opportunity to join a great team & mentor at the same time!
The Training team @OpenAI is hiring researchers in London 🚀 Our twin missions are to train better LLMs, and serve them more cheaply Get in touch if you are excited to collaborate on architecture design, reliable scaling, and faster optimization
0
1
15
@Leik0w0 @AltmejdAdam @itsclivetime Scott wrote his own SASS assembler so that he could get his matmul kernel to 98% of theoretical throughput (ptxas could only get 70%). https://t.co/HazorS0gns I learned so much of what I know about writing efficient CUDA from reading Scott’s sgemm walkthrough!
github.com
Assembler for NVIDIA Maxwell architecture. Contribute to NervanaSystems/maxas development by creating an account on GitHub.
3
7
83
This was the culmination of a long story! The origins of these ideas came from writing AVX assembly back in 2019 for transposing bitmatrices, and now finally have come to fruition as a general framework for choosing register layouts on SIMD architectures:
0
0
22
Here's the code that we (mostly GPT-4.5 rather than me) wrote together: https://t.co/qelwXcGnzY
github.com
This avoids pathological edgecases (often involving tl.sort) where the ttgir graph grows exponentially as a result of repeated rematerialisations. Tested on internal benchmarks and there are no obs...
0
0
2
I was pleasantly surprised with how well GPT-4.5 writes code: my prior experience with LLMs is that they do things pretty naively (usually with atrocious performance), but GPT-4.5 used memoization and performant data structures ab initio.
1
0
2
To get GPT-4.5 to be maximally helpful, I resorted to prompting it as follows: -- stating high-level intent first; -- pasting the entire ~ 1500-line source file; and then had an interactive conversation with it.
1
0
2
As a result, in certain rare cases (especially when using the bitonic sort operation which triggers this behaviour) you would get many successive duplications resulting in an exponential increase in intermediate IR size (and thus compilation time).
1
0
2
Essentially, the problem arose from how eagerly Triton's backwardMaterialization pass would duplicate parts of the IR graph to avoid layout conversions: if anything consisted purely of 'cheap arithmetic' it would get duplicated, irrespective of the amount of arithmetic.
1
0
3
I've had my first successful experience of 'vibe-coding' today: using GPT-4.5 (which is far more au fait with LLVM/MLIR than I am) to modify a Triton compiler pass to avoid certain edge-cases with exponential compilation times. 🧵
3
0
9
I've written up what I know about a heavily customised PDP-3 computer built in 1960 by Charles Corderman and collaborators (originally for military applications, and later used for recreational mathematics): https://t.co/6KBR8nxLks
cp4space.hatsya.com
This is an atypical post, being chiefly about the history of a rather obscure computer that was built in 1960 out of repurposed PDP parts, but it needs to be written somewhere lest it be forgotten.…
0
0
3
Not at all! You can generate and store hashes of all strings within a Levenshtein distance of r of the password and see whether there are collisions between those two radius-r balls, determining whether they’re within a distance of 2r. (r=1 is very practical.)
Just a reminder to everybody: If a website compares your new password with your old one, the web browser has to send a plain text password instead of a hashed one. This is a security issue.
1
0
11
it is hard to overstate how much alec radford has contributed to the field, and how much of everyone's current progress traces back to his work. i believe he is a genius at the level of einstein, and also he is one of my favorite people ever--hard to imagine a nicer, warmer, or
297
392
8K
AGI has been achieved internally
New verified ARC-AGI-Pub SoTA! @OpenAI o3 has scored a breakthrough 75.7% on the ARC-AGI Semi-Private Evaluation. And a high-compute o3 configuration (not eligible for ARC-AGI-Pub) scored 87.5% on the Semi-Private Eval. 1/4
0
1
11
Haven’t been on here for over a year but a special exception has to be made to send a huge thanks & to express my sheer admiration of @tessybarton for creating this work of art & gift wrapping with such TLC (and a chip bow!) (and ofc @apgox for such a unique surprise) #gpu #bag
1
3
17
Somehow this Montgomery trick has managed to replace this with cheap arithmetic plus a one-time preprocessing step, where remarkably the preprocessing step only involves reducing a 64-bit integer. This feels impossible!
3
0
0
Suppose that you wanted to compute ab (mod N) without this Montgomery trick. If N is a 64-bit integer, then the product ab would require 128 bits to store the result, so you'd need to reduce a 128-bit integer modulo a 64-bit integer, usually requiring an expensive function call!
1
0
0
Why do I find this remarkable? Well, it means that you can do any amount of arbitrary ring arithmetic mod N (any odd constant fitting in a machine word) just by using cheap processor instructions together with a preprocessing step involving 1 machine-word-sized modular reduction.
1
0
0
The Montgomery representative of 2 is just 2R (mod N), which you can obtain in the following way: - compute (R/2) mod N with a single machine-word modulo instruction (the only time we ever use this!); - double it twice to get 2R mod N. See the top of
2
0
1