Dougall @dougallj X Profile

Dougall

@dougallj

Followers

3K

Following

12K

Media

136

Statuses

3K

he/they | mastodon: https://t.co/d5YdiePIr8 / @[email protected]

https://t.co/nEExqZiixX

Joined February 2007

Don't wanna be here? Send us removal request.

Dougall

@dougallj

11 months

Not news, but: Fediverse (active): https://t.co/R9bvhzseVX / @dougall @mastodon.social Bluesky:

0

2

Dougall

@dougallj

3 days

My view. https://t.co/wojAwRsg23

Dougall

@dougallj

3 years

My view. https://t.co/4vKxhS4v0A

0

7

hikari 🌟 (fell out of the sky)

@hikari_no_yume

11 months

very sad that cohost is going. maybe its unique genre of posting, the css crime, will go with it. well, i'm glad that there's this worthy goodbye to it

0

1

17

AnandTech

@anandtech

1 year

After 27 years of providing in-depth coverage of the amazing world of PC and mobile hardware, AnandTech is saying farewell. We want to thank everyone from the AnandTech community for their support and passion for what we’ve done over the years https://t.co/3EGh4FJguE

forums.anandtech.com

Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

433

1K

7K

Rui Ueyama

@rui314

1 year

CRC checksum is not a cryptographic hash function. It is easy to add a few bytes of garbage at the end of a file so that the CRC of the entire file becomes a desired value. But did you need to do that in the linker? Yes. https://t.co/Wu7gU2j5Hb

3

15

110

Andreas Schilling 🇺🇦

@aschilling

1 year

Analysis of the M4 by @techinsightsinc „key design choice in the Apple M4 is the use of TSMC's high-performance standard cell library for CPU1“ …“UHD libraries for GPU and CPU2“ aka TSMC FinFlex CPU1: P-Cores CPU2: E-Cores https://t.co/DMuS9Tkg3h

9

61

380

Jon Masters 🏴‍☠️

@jonmasters

1 year

Starting to think this @Arm thing might have legs

7

1

27

INIYSA

@lafaiel

1 year

Intel is now essentially following Apple's design philosophy, with an integrated memory architecture, a large front-end, a large L1 cache, removal of SMT, 4+4 cores

39

96

1K

Dougall

@dougallj

1 year

"Vulkan 1.3 on the M1 in 1 month", by Alyssa Rosenzweig https://t.co/ZyS9EaMBfK

2

14

65

Alexander Yee

@Mysticial

1 year

With Lisa's keynote over, I will casually drop this to confirm that y-cruncher v0.8.5 will get #Zen5 optimizations with it's big and fat #AVX512. Release date still ETA - pending final silicon. https://t.co/dohnqbg7Ex ST and BBP will be where the light shines the brightest.

Alexander Yee

@Mysticial

1 year

y-cruncher's BBP digit extractor will turn into a proper benchmark for v0.8.5. It's a pure SIMD/AVX workload that doesn't touch memory. Why? The CPU/memory perf gap is becoming so large that the classic Pi benchmarks are turning into pure memory tests. https://t.co/jNWoc0sXKt

6

13

94

Pete Cawley

@corsix

1 year

@dyaroshev @geofflangdale @HaroldAptroot @FUZxxl FWIW, I've also given it a decent write-up at https://t.co/lQoW9gnNMr. Only notable change is putting maskz on the shuffle rather than on the permutexvar, as https://t.co/V5fqZLYP4G suggests maskz costs a few cycles, so might as well put it on the cheaper of the two instructions.

corsix.org

1

8

23

Nat Brown @[email protected]

@natbro

1 year

Fantastical deep information about optimizing for Apple Silicon can now be found here:

7

31

168

clamchowder

@lamchester

2 years

Example of why power draw is a poor proxy for how well a program is utilizing your hardware (i7-7700K CPU in this case). In one case, execution port use is heavy and IPC is high, so the CPU's pipeline is running at high utilization. However power draw is low. Second case...

3

13

71

Fabian Giesen

@rygorous

2 years

New blog post: "Entropy decoding in Oodle Data: x86-64 6-stream Huffman decoders"

fgiesen.wordpress.com

It’s been a while! Last time, I went over how the 3-stream Huffman decoders in Oodle Data work. The 3-stream layout is what we originally went with. It gives near-ideal performance on the las…

1

20

74

𝐷𝑟. 𝐼𝑎𝑛 𝐶𝑢𝑡𝑟𝑒𝑠𝑠

@IanCutress

2 years

Here is @Qualcomm #Oryon single threaded performance: 👉 VS Apple M2: 14% faster at 30% less power 👉 VS Intel i9-13980HK: 1% faster at 70% less power

49

83

558

Best of Dying Twiter

@bestofdyingtwit

2 years

oh my god

Matt Binder

@MattBinder

2 years

hmm did the guy famous for baselessly accusing someone he didn’t like of being a “pedo guy” just suspend @JUNlPER after she did the same to him?

16

190

3K

Dougall

@dougallj

2 years

* 8-bit floating-point, separately configurable to use E5M2 or E4M3 formats for different operands Scalar highlights: * "Checked pointer" add/sub/multiply-add/multiply-sub – for faster/safer top-byte-ignore? * New "enhanced" pointer authentication ops.

1

6

Dougall

@dougallj

2 years

New Arm extensions just dropped: https://t.co/9tQFfJNPOH SIMD highlights: * LUTI2 and LUTI4 – powerful pshufb/tbl-like instructions, decoding bit-packed 2-bit or 4-bit indices to 8-bit or 16-bit values * Floating-point absolute min/max

2

9

35