Alex Miller
@AlexMillerDB
Followers
3K
Following
403
Media
75
Statuses
942
Databases. See also @[email protected] or @alexmillerdb.bsky.app
Joined May 2014
I can’t seem to think of blog posts I’ve seen over time that reflect all the things I had to learn by failing at them.
2
0
0
I’m thinking of things like: * Driving consensus between conflicting (busy) approvers * Ensuring you get proper feedback from design docs. * Doing good work is as important as performatively doing good work * How to make the case for a new project.
1
0
3
Does anyone have links to good writing on the sort of soft skills you learn from working in larger organizations about how to work in larger organizations as an IC? The overall space of soft skills dealing with the pretty common ways that large corporations behave.
2
0
7
The recording from our last South Bay Systems meetup is now available! https://t.co/4Q7Q4d5bFo
0
8
43
[PVLDB] Enhancing Transaction Processing through Indirection Skipping https://t.co/SiQB5ohFEJ Whereas VMCache improve pointer swizzing's complexity by removing the swizzling, this work points out that page and frame hints are highly effective, and okay if they're wrong.
2
6
43
Then, we'll have "The Evolution of Semi-Structured Data Analytics" by Owen Xiao, Co-founder of VeloDB and PMC member of Apache Doris, where we'll hear about the difficulties of analytics on semi-structured data, and the approach that Apache Doris took to address them.
0
0
3
Our first talk is "Low-Latency Serving on Cloud Object Stores with Apache Pinot" by Songqiao Su and Raghav Yadav, both Staff Engineers from StarTree, to talk about storage tiering and iceberg support under low-latency, real-time analytics requirements.
1
0
3
South Bay Systems returns on October 27th at Adobe in downtown San Jose. We have an Analytics-on-Object-Storage double feature this time starring two different Apache projects: Apache Pinot and Apache Doris. (Talk descriptions below.) Register now!
luma.com
Welcome to another edition of South Bay Systems! This time, we'll have a double feature! First we'll have Songqiao Su and Raghav Yadav talking about…
1
3
18
[ASPLOS'25] Fusion: An Analytics Object Store Optimized for Query Pushdown https://t.co/ntBQ3njtlw Tightly integrating an Iceberg catalog with an object store means that one could make file-format aware erasure coding decisions, to permit pushing down filters and aggregations.
1
14
104
[VLDB] Towards Principled, Practical Document Database Design https://t.co/EA819FKjLo If you've ever wished that there was a document database equivalent for relational databases' 3NF-style schema design guidance, then this is the paper for you.
0
8
56
[arXiv] On the Theoretical Limitations of Embedding-Based Retrieval https://t.co/z5i3qCDnaq It's impossible to retrieve all combinations of pairs of documents post-embedding. Thus, there's usecases that vector search won't do well at. Conversely, BM25 excels in these cases.
1
1
28
WebAssembly is Cool! (finally!) - The South Bay Systems talk series is back on Oct 2nd. Jakob will talk about the history and novel use cases of WebAssembly. Perfect timing too: Wasm 3.0 just released, and it is turning 10 this year. From browsers to embedded systems and beyond,
1
4
16
Thread on it with a better overview from the author. Great paper, @g_sehgal1997!
🎉 Thrilled to announce that our paper "NaviX" on vector search has been accepted to one of the top systems conferences, @VLDBconf! 🚀 Happening in London 🎡 this September. 📝 : https://t.co/HzPyBMKurI 🧑💻 : https://t.co/AnyIYUxXnH 1/18
1
0
3
[VLDB] NaviX: A Native Vector Index Design for Graph DBMSs With Robust Predicate-Agnostic Search Performance https://t.co/74gmNi9qMh It feels like a follow-on/improvement to ACORN. Also interesting to see HNSW built directly on a graph database working well.
1
2
20
[arXiv] Theseus: A Distributed and Scalable GPU-Accelerated Query Processing Platform Optimized for Efficient Data Movement https://t.co/Gn3mgXFRAn Great to see that Voltron Data folk writing about their GPU database!
0
5
80
Relatedly, there are PRNGs faster than mersenne twister that are of reasonable quality. Xoshiro family seems well respected https://t.co/Ri1TqPtD9K, and vectorizes! https://t.co/UiUNANRWzp and https://t.co/lWSnHNFopp are also interesting reads.
0
1
3
To randomly sample a number of operations, one pulls from a PRNG. https://t.co/QBeHJYZh0U instead shows a cute trick for defining a stateless PRNG: pull RDTSC, run it through a quick hash to scramble the bits (e.g. rapidhash). Cache-miss-free, but you lose determinism in tests.
github.com
MySQL RP (Restore Performance) is modified version of MySQL Community, to restore performance equal to or better than previous major versions. - buildup-db/mysql-server-RP
2
0
24
Had our biggest South Bay Systems meetup yet last night! Thanks to everyone who came and joined the vibrant discussion. Big thanks to @databricks for hosting! @andy_pavlo from CMU gave a deep dive into the 50-year history of database tuning, his work applying AI/ML to the
2
7
97
Attention, South Bay folk! We have The Databaseologist, @andy_pavlo, giving a talk in the bay on August 6th. Come join us for a great time in hearing: ChatGPT Ain’t Got $%@& On Me! The Future of Automated Database Tuning Register now!
luma.com
We're excited to feature Andy Pavlo, illustrious database professor at CMU, to talk about database tuning. This meetup's venue, food and drinks, are generously…
1
9
53
I had missed @ssougou's blog post series on consensus when it was originally posted. I really like the perspective of breaking down Raft/Paxos/etc. into the individual actions that comprise consensus. https://t.co/d80Q34RjX7
planetscale.com
This is a multi-part blog series and will be updated with links to the corresponding posts.
1
24
241