Jack Vanlightly
@vanlightly
Followers
5K
Following
1K
Media
105
Statuses
2K
@confluentinc thinking about event streaming. Ex @Splunk, @VMware https://t.co/3axXZezyy4, https://t.co/voJWmL4KM6 Credit: ESO/B. Tafreshi
Barcelona, Spain
Joined November 2016
New blog post! "How Would You Like Your Iceberg Sir? Stream or Batch Ordered?" Many teams today are trying to unify their streaming and batch pipelines over Apache Iceberg. But physical unification comes with trade-offs: 🔹 Flink streaming mode and other stream processors want
jack-vanlightly.com
Today I want to talk about stream analytics, batch analytics and Apache Iceberg. Stream and batch analytics work differently but both can be built on top of Iceberg, but due to their differences...
0
16
85
It's that time of year again, Christmas is around the corner and my wife has knitted many more shawls! Check them out, and use the code JACKV for a 10% discount. Extra points to guess who is modeling them in the photos (spoiler, it's me 😄). https://t.co/qXmgDzsiL9
etsy.com
Shop items by SquiggleKnit.
0
0
1
Three KIPs (1150, 1176, 1183) all target Kafka’s cross-AZ replication costs but there is a wider question at stake. My new post explains the KIPs, the trade-offs between reusing old abstractions vs. embracing stateless compute over S3. https://t.co/GAUHFXSpw2
jack-vanlightly.com
“ The Kafka community is currently seeing an unprecedented situation with three KIPs ( KIP-1150 , KIP-1176 , KIP-1183) simultaneously addressing the same challenge of high replication costs when...
0
7
50
Data Science Weekly Issue 621 (last week poll result). I am in the "hate" group if only because it smacks of laziness, unoriginality and outsourcing of thought. ChatGPT can be a great writing/analysis assistant (it's a decent reviewer) but copy and paste wholesale just
0
0
4
New post: why I’m not a fan of “zero-copy” Iceberg tables for Apache Kafka. From a systems design view, it trades storage savings for coupling and complexity. Sometimes, duplication is cheaper than coupling. https://t.co/KokWXANd6z
jack-vanlightly.com
Over the past few months, I’ve seen a growing number of posts on social media promoting the idea of a “zero-copy” integration between Apache Kafka and Apache Iceberg. The idea is that Kafka topics...
1
19
193
Why don’t Iceberg or Delta Lake have secondary indexes? Because analytics workloads and OLTP workloads optimize for opposite I/O patterns. See my dive into data layout, pruning, and what “indexing” really means in open table formats:
jack-vanlightly.com
My career in data started as a SQL Server performance specialist, which meant I was deep into the nuances of indexes, locking and blocking, execution plan analysis and query design. These days I’m...
2
30
200
New deep dive: Understanding Apache Fluss I spent August reverse-engineering Fluss, Alibaba’s new table storage engine for Flink (partially forked from Kafka). This post covers its architecture, tiering, and how it tackles changelogs & low-latency state. https://t.co/JQEKngD4Nn
jack-vanlightly.com
This is a data system internals blog post. So if you enjoyed my table formats internals blog posts , or writing on Apache Kafka internals or Apache BookKeeper internals , you might enjoy this one....
2
18
135
There's been some talk around storage unification and zero-copy lakehouse integrations recently, and I wanted to better define these terms as well as look at ways we should evaluate different design decisions in this space. So I’ve published a new post: A Conceptual Model for
jack-vanlightly.com
Object storage is taking over more of the data stack, but low-latency systems still need separate hot-data storage. Storage unification is about presenting these heterogeneous storage systems and...
0
6
48
In a future of autonomous AI agents, we can't limit ourselves to error prevention and error detection, we must also include remediation. But when AI loses touch with reality due to hallucinations, confabulation and misinterpretation, who does the remediation? In cases of
jack-vanlightly.com
If you’re following the world of AI right now, no doubt you saw Jason Lemkin’s post on social media reporting how Replit’s AI deleted his production database , despite it being told not to touch...
0
0
2
Science moves slowly because wrong theories waste decades. Engineering is careful because failures kill people. Software moves fast because mistakes are cheap, the expensive error isn't making the wrong choice, it's taking too long to make any choice. https://t.co/Bp6sAEBdIF
jack-vanlightly.com
A recent LinkedIn post by Nick Lebesis caught my attention with this brutal take on the difference between good startup founders and coward startup founders. I recommend you read the entire thing to...
3
11
57
A new case study is born
0
0
8
In distributed systems, reliability isn’t just about retries and durability, it’s about knowing who owns recovery. My latest post, based on the Coordinated Progress model I posted previously, explores how reliable triggers create responsibility boundaries and how those boundaries
jack-vanlightly.com
Building on my previous work on the Coordinated Progress model, this post examines how reliable triggers not only initiate work but also establish responsibility boundaries . Where a reliable trigger...
0
15
101
Over the past few months, I’ve been thinking deeply about how systems make progress reliably in the face of partial failures, service boundaries, retries, and complex dependencies. Building reliable workflows across microservices, functions, and stream processors is one of the
jack-vanlightly.com
At some point, we’ve all sat in an architecture meeting where someone asks, “ Should this be an event? An RPC? A queue? ”, or “ How do we tie this process together across our microservices? Should it...
3
16
61
How to reliably distribute work across microservices, stream processors, durable execution, event-driven, orchestration and now AI agents? Coordinated Progress is a 4-part series that explores the common structure behind reliable distributed systems. https://t.co/fTnkrXljCv
jack-vanlightly.com
At some point, we’ve all sat in an architecture meeting where someone asks, “ Should this be an event? An RPC? A queue? ”, or “ How do we tie this process together across our microservices? Should it...
1
31
180
Another Humans of the Data Sphere is out, with issue 10! In this issue people are talking fsyncs, tips for running ClickHouse at scale, the problems with MCP and more. Plus I dig up a classic paper from 1962.
hotds.dev
Your biweekly dose of insights, observations, commentary and opinions from interesting people from the world of databases, AI, streaming, distributed systems and the data engineering/analytics space.
0
2
7
The specs are in this repo for anyone interested.
github.com
TLA+ specifications for Kafka related algorithms. Contribute to Vanlightly/kafka-tlaplus development by creating an account on GitHub.
0
0
6
Proud to have contributed formal verification (TLA+) for three key improvements in Kafka 4.0: ✅ KIP-966: Strengthens the replication protocol. ✅ KIP-996: Introduces PreVote for more stable KRaft leadership. ✅ KIP-848: Delivers more efficient, predictable rebalancing.
4
20
114
Seems like I’m not alone. For what it’s worth, I’ve got a great fit at Confluent — but the more senior I get, the more I wonder how sustainable that is across future PE roles. Thinking of writing a blog post, maybe with interviews or perspectives from PEs who aren’t natural cat
Any Principal Engineers out there with ADHD or creative wiring — who don’t thrive in the tasks of project coordination, alignment meetings, and people management, but thrive on strategy, system design, writing, and shaping direction through ideas? Curious how you navigate the
0
1
30