
Jack Vanlightly
@vanlightly
Followers
4K
Following
1K
Media
104
Statuses
2K
@confluentinc thinking about event streaming. Ex @Splunk, @VMware https://t.co/3axXZezyy4, https://t.co/voJWmL4KM6 Credit: ESO/B. Tafreshi
Barcelona, Spain
Joined November 2016
I've written 18 posts (and counting) on table format internals. I've created a page that contains the list of my writings on the subject, including my formal verification work. Any suggestions on further table format analysis?.
jack-vanlightly.com
I’ve created this page to make it easier for me to share links about my writing on table format internals. Currently, it includes Apache Iceberg, Delta Lake, Apache Hudi, and Apache Paimon.
2
14
124
New deep dive: Understanding Apache Fluss. I spent August reverse-engineering Fluss, Alibaba’s new table storage engine for Flink (partially forked from Kafka). This post covers its architecture, tiering, and how it tackles changelogs & low-latency state.
jack-vanlightly.com
This is a data system internals blog post. So if you enjoyed my table formats internals blog posts , or writing on Apache Kafka internals or Apache BookKeeper internals , you might enjoy this one....
2
17
133
Science moves slowly because wrong theories waste decades. Engineering is careful because failures kill people. Software moves fast because mistakes are cheap, the expensive error isn't making the wrong choice, it's taking too long to make any choice.
jack-vanlightly.com
A recent LinkedIn post by Nick Lebesis caught my attention with this brutal take on the difference between good startup founders and coward startup founders. I recommend you read the entire thing to...
3
11
56
A new case study is born.
0
0
8
How to reliably distribute work across microservices, stream processors, durable execution, event-driven, orchestration and now AI agents?. Coordinated Progress is a 4-part series that explores the common structure behind reliable distributed systems.
jack-vanlightly.com
At some point, we’ve all sat in an architecture meeting where someone asks, “ Should this be an event? An RPC? A queue? ”, or “ How do we tie this process together across our microservices? Should it...
1
31
183
Another Humans of the Data Sphere is out, with issue 10! In this issue people are talking fsyncs, tips for running ClickHouse at scale, the problems with MCP and more. Plus I dig up a classic paper from 1962.
hotds.dev
Your biweekly dose of insights, observations, commentary and opinions from interesting people from the world of databases, AI, streaming, distributed systems and the data engineering/analytics space.
0
2
7
The specs are in this repo for anyone interested.
github.com
TLA+ specifications for Kafka related algorithms. Contribute to Vanlightly/kafka-tlaplus development by creating an account on GitHub.
0
0
5
Seems like I’m not alone. For what it’s worth, I’ve got a great fit at Confluent — but the more senior I get, the more I wonder how sustainable that is across future PE roles. Thinking of writing a blog post, maybe with interviews or perspectives from PEs who aren’t natural cat.
Any Principal Engineers out there with ADHD or creative wiring — who don’t thrive in the tasks of project coordination, alignment meetings, and people management, but thrive on strategy, system design, writing, and shaping direction through ideas? Curious how you navigate the.
0
1
30
RT @ijuma: And the old group coordinator implementation is gone from Apache Kafka - love it when open-source projects can delete large chun….
github.com
This patch is the third of a series of patches to remove the old group coordinator. With the release of Apache Kafka 4.0, the so-called new group coordinator is the default and only option availabl...
0
3
0
A new disaggregated log replication survey post is out. How does the combination of Apache Pulsar with Apache BookKeeper divide and conquer the responsibilities of log replication?
jack-vanlightly.com
In this latest post of the disaggregated log replication survey, we’re going to look at the Apache BookKeeper Replication Protocol and how it is used by Apache Pulsar to form topic partitions. Raft...
0
19
99
Another Humans of the Data Sphere is out, with issue #9! In this issue, we also look at whether software engineers can learn from mechanical engineering, and looking at table formats as a form of virtualization.
hotds.dev
Your biweekly dose of insights, observations, commentary and opinions from interesting people from the world of databases, AI, streaming, distributed systems and the data engineering/analytics space.
0
2
8
RT @ankushpd: If you are looking for formal models of a real-world distributed system, DeepSeek @deepseek_ai released P specifications for….
0
42
0
A new log replication disaggregation survey post is out! .The Kafka Replication Protocol:.🔹Separation of control plane from data plane. 🔹Role separation with minimal coupling. 🔹Kafka’s alignment with Paxos roles.
jack-vanlightly.com
In this post, we’re going to look at the Kafka Replication Protocol and how it separates control plane and data plane responsibilities. It’s worth noting there are other systems that separate...
2
17
119