vanlightly Profile Banner
Jack Vanlightly Profile
Jack Vanlightly

@vanlightly

Followers
5K
Following
1K
Media
105
Statuses
2K

@confluentinc thinking about event streaming. Ex @Splunk, @VMware https://t.co/3axXZezyy4, https://t.co/voJWmL4KM6 Credit: ESO/B. Tafreshi

Barcelona, Spain
Joined November 2016
Don't wanna be here? Send us removal request.
@vanlightly
Jack Vanlightly
9 days
New blog post! "How Would You Like Your Iceberg Sir? Stream or Batch Ordered?" Many teams today are trying to unify their streaming and batch pipelines over Apache Iceberg. But physical unification comes with trade-offs: 🔹 Flink streaming mode and other stream processors want
Tweet card summary image
jack-vanlightly.com
Today I want to talk about stream analytics, batch analytics and Apache Iceberg. Stream and batch analytics work differently but both can be built on top of Iceberg, but due to their differences...
0
16
85
@vanlightly
Jack Vanlightly
10 days
It's that time of year again, Christmas is around the corner and my wife has knitted many more shawls! Check them out, and use the code JACKV for a 10% discount. Extra points to guess who is modeling them in the photos (spoiler, it's me 😄). https://t.co/qXmgDzsiL9
Tweet card summary image
etsy.com
Shop items by SquiggleKnit.
0
0
1
@vanlightly
Jack Vanlightly
23 days
Three KIPs (1150, 1176, 1183) all target Kafka’s cross-AZ replication costs but there is a wider question at stake. My new post explains the KIPs, the trade-offs between reusing old abstractions vs. embracing stateless compute over S3. https://t.co/GAUHFXSpw2
Tweet card summary image
jack-vanlightly.com
“ The Kafka community is currently seeing an unprecedented situation with three KIPs ( KIP-1150 , KIP-1176 , KIP-1183) simultaneously addressing the same challenge of high replication costs when...
0
7
50
@vanlightly
Jack Vanlightly
28 days
Data Science Weekly Issue 621 (last week poll result). I am in the "hate" group if only because it smacks of laziness, unoriginality and outsourcing of thought. ChatGPT can be a great writing/analysis assistant (it's a decent reviewer) but copy and paste wholesale just
0
0
4
@vanlightly
Jack Vanlightly
30 days
New post: why I’m not a fan of “zero-copy” Iceberg tables for Apache Kafka. From a systems design view, it trades storage savings for coupling and complexity. Sometimes, duplication is cheaper than coupling. https://t.co/KokWXANd6z
Tweet card summary image
jack-vanlightly.com
Over the past few months, I’ve seen a growing number of posts on social media promoting the idea of a “zero-copy” integration between Apache Kafka and Apache Iceberg. The idea is that Kafka topics...
1
19
193
@vanlightly
Jack Vanlightly
1 month
Why don’t Iceberg or Delta Lake have secondary indexes? Because analytics workloads and OLTP workloads optimize for opposite I/O patterns. See my dive into data layout, pruning, and what “indexing” really means in open table formats:
Tweet card summary image
jack-vanlightly.com
My career in data started as a SQL Server performance specialist, which meant I was deep into the nuances of indexes, locking and blocking, execution plan analysis and query design. These days I’m...
2
30
200
@vanlightly
Jack Vanlightly
2 months
New deep dive: Understanding Apache Fluss I spent August reverse-engineering Fluss, Alibaba’s new table storage engine for Flink (partially forked from Kafka). This post covers its architecture, tiering, and how it tackles changelogs & low-latency state. https://t.co/JQEKngD4Nn
Tweet card summary image
jack-vanlightly.com
This is a data system internals blog post. So if you enjoyed my table formats internals blog posts , or writing on Apache Kafka internals or Apache BookKeeper internals , you might enjoy this one....
2
18
135
@vanlightly
Jack Vanlightly
3 months
There's been some talk around storage unification and zero-copy lakehouse integrations recently, and I wanted to better define these terms as well as look at ways we should evaluate different design decisions in this space. So I’ve published a new post: A Conceptual Model for
Tweet card summary image
jack-vanlightly.com
Object storage is taking over more of the data stack, but low-latency systems still need separate hot-data storage. Storage unification is about presenting these heterogeneous storage systems and...
0
6
48
@vanlightly
Jack Vanlightly
4 months
In a future of autonomous AI agents, we can't limit ourselves to error prevention and error detection, we must also include remediation. But when AI loses touch with reality due to hallucinations, confabulation and misinterpretation, who does the remediation? In cases of
Tweet card summary image
jack-vanlightly.com
If you’re following the world of AI right now, no doubt you saw Jason Lemkin’s post on social media reporting how Replit’s AI deleted his production database , despite it being told not to touch...
0
0
2
@vanlightly
Jack Vanlightly
4 months
Science moves slowly because wrong theories waste decades. Engineering is careful because failures kill people. Software moves fast because mistakes are cheap, the expensive error isn't making the wrong choice, it's taking too long to make any choice. https://t.co/Bp6sAEBdIF
jack-vanlightly.com
A recent LinkedIn post by Nick Lebesis caught my attention with this brutal take on the difference between good startup founders and coward startup founders. I recommend you read the entire thing to...
3
11
57
@vanlightly
Jack Vanlightly
4 months
A new case study is born
@jasonlk
Jason ✨👾SaaStr.Ai✨ Lemkin
4 months
.@Replit goes rogue during a code freeze and shutdown and deletes our entire database
0
0
8
@vanlightly
Jack Vanlightly
4 months
In distributed systems, reliability isn’t just about retries and durability, it’s about knowing who owns recovery. My latest post, based on the Coordinated Progress model I posted previously, explores how reliable triggers create responsibility boundaries and how those boundaries
Tweet card summary image
jack-vanlightly.com
Building on my previous work on the Coordinated Progress model, this post examines how reliable triggers not only initiate work but also establish responsibility boundaries . Where a reliable trigger...
0
15
101
@vanlightly
Jack Vanlightly
5 months
Over the past few months, I’ve been thinking deeply about how systems make progress reliably in the face of partial failures, service boundaries, retries, and complex dependencies. Building reliable workflows across microservices, functions, and stream processors is one of the
Tweet card summary image
jack-vanlightly.com
At some point, we’ve all sat in an architecture meeting where someone asks, “ Should this be an event? An RPC? A queue? ”, or “ How do we tie this process together across our microservices? Should it...
3
16
61
@vanlightly
Jack Vanlightly
5 months
How to reliably distribute work across microservices, stream processors, durable execution, event-driven, orchestration and now AI agents? Coordinated Progress is a 4-part series that explores the common structure behind reliable distributed systems. https://t.co/fTnkrXljCv
Tweet card summary image
jack-vanlightly.com
At some point, we’ve all sat in an architecture meeting where someone asks, “ Should this be an event? An RPC? A queue? ”, or “ How do we tie this process together across our microservices? Should it...
1
31
180
@vanlightly
Jack Vanlightly
7 months
Another Humans of the Data Sphere is out, with issue 10! In this issue people are talking fsyncs, tips for running ClickHouse at scale, the problems with MCP and more. Plus I dig up a classic paper from 1962.
Tweet card summary image
hotds.dev
Your biweekly dose of insights, observations, commentary and opinions from interesting people from the world of databases, AI, streaming, distributed systems and the data engineering/analytics space.
0
2
7
@vanlightly
Jack Vanlightly
7 months
0
0
3
@vanlightly
Jack Vanlightly
7 months
Proud to have contributed formal verification (TLA+) for three key improvements in Kafka 4.0: ✅ KIP-966: Strengthens the replication protocol. ✅ KIP-996: Introduces PreVote for more stable KRaft leadership. ✅ KIP-848: Delivers more efficient, predictable rebalancing.
4
20
114
@vanlightly
Jack Vanlightly
8 months
Seems like I’m not alone. For what it’s worth, I’ve got a great fit at Confluent — but the more senior I get, the more I wonder how sustainable that is across future PE roles. Thinking of writing a blog post, maybe with interviews or perspectives from PEs who aren’t natural cat
@vanlightly
Jack Vanlightly
8 months
Any Principal Engineers out there with ADHD or creative wiring — who don’t thrive in the tasks of project coordination, alignment meetings, and people management, but thrive on strategy, system design, writing, and shaping direction through ideas? Curious how you navigate the
0
1
30