Andrew Lamb
@andrewlamb1111
Followers
4K
Following
434
Media
111
Statuses
783
Apache {DataFusion, Arrow} PMC, Database Engineer
Joined November 2020
Qiwei Huang explains how we use Late Materialization (LM) in the Apache Rust Parquet reader to accelerate filtering. LM can describe several techniques, but this is a core one (also applies to joins, Top-K, etc) https://t.co/F4Lqc29IeH
6
12
94
In the past few years, we’ve seen a cambrian explosion of new columnar formats, challenging the hegemony of Parquet. Presumably, the design of yore is not going to cut it moving forward. I spent some time to understand how things actually changed. https://t.co/BxtXabynQ8
sympathetic.ink
In the past few years, we’ve seen a cambrian explosion of new columnar formats, challenging the hegemony of Parquet: Lance, Fastlanes, Nimble, Vortex, AnyBlox, F3 (File Format for the Future). The...
3
15
88
"Column Storage for the AI Era" Enjoyed reading this post about the current state and future of #Parquet, by Julien Le Dem (@J_). New encodings, championed by new columnar formats like Lance, taking advantage of SIMD, etc. Exciting times. 👉 https://t.co/uuYRrnbJqf
4
21
135
Thanks to https://t.co/jc4P72tdmO, we are hosting a DataFusion meetup in Stockholm Date: Thursday March 5, 2026: 17:30 - 20:00 Signup:
luma.com
Join us for an evening of talks, panel discussions, and community discussions about Apache DataFusion and its growing role in modern data infrastructure. This…
0
2
14
There is some crazy (good) activity on the @ApacheParquet mailing list for new encodings. A sample: PFOR, FSST, ALP, Strings and Cascaded Encodings. 🤯 Huge kudos to Arnav Balyan, Prateek Gaur, and Micah Kornfield for driving this. https://t.co/YRE13QEm2j
1
7
61
Here are the slides and recordings from our Boston DataFusion Meetup in September: Youtube: https://t.co/TGgK79oKjd Slides (pdf): https://t.co/kSKpuTFRux
1
6
47
Why rebuild the wheel? @olimpiupop talks with @andrewlamb1111 about how Apache Arrow, Parquet, and the FDAP stack are letting database teams focus on innovation instead of reinventing the basics. https://t.co/9EfG3SgcXq
0
1
4
Building Modern Databases with the FDAP Stack • Andrew Lamb & Olimpiu Pop • GOTO 2025 https://t.co/R5ty8GtDdo
0
7
68
Does anyone know a good academic / industrial overview of how to implement (not use) LATERAL joins in SQL? It keeps coming up in @ApacheDataFusio and I need to get reasonable background on it. https://t.co/8mZXmOxDuP
5
2
38
Save the date -- Wednesday July 22, 2026 for the first Apache DataFusion meetup in Denver:
luma.com
Join us for an evening of talks, panel discussion, and community discussion about Apache DataFusion and its growing role in modern data infrastructure. We will…
0
0
7
One fun nugget from the Boston @ApacheDataFusio meetup on Wednesday: DataDog reports they run 68+million queries per hour with DataFusion
2
4
52
Coming soon in arrow-rs: adaptive predicate pushdown https://t.co/72KRzneR8e (aka we are really close to turning on late materialization by default in parquet scans in @ApacheDataFusio )
github.com
Which issue does this PR close? Closes [Parquet]Performance Degradation with RowFilter on Unsorted Columns due to Fragmented ReadPlan #8565 Closes Adaptive Parquet Predicate Pushdown Evaluation #5...
0
10
73
An interesting take on the composable data stack: a unified distributed execution framework (Sail) will drive innovation in distributed systems implementation the way @ApacheDataFusio et. al have been driving innovation in the computation layer
For too long, the composable data stack has lacked a solid distributed compute layer. Our latest blog post covers why we believe Sail is the last missing piece, definitely worth a read. Read the full post:
0
4
33
Here is a nice examination of the benefits of building new systems using the extensibility of @ApacheDataFusio vs other systems.
In the eternal struggle between Good vs Evil, Blur vs Oasis, and @duckdb vs @ApacheDataFusio , we just switched to DataFusion after 18 months, while keeping our #FaaS magic intact: https://t.co/wF2GsLkPHq
0
7
66
In the eternal struggle between Good vs Evil, Blur vs Oasis, and @duckdb vs @ApacheDataFusio , we just switched to DataFusion after 18 months, while keeping our #FaaS magic intact: https://t.co/wF2GsLkPHq
1
14
82
"If you want to go fast, go alone; If you want to go far, go together" New Apache Parquet Community page is up: https://t.co/brxdfHc6Kc
3
4
65
We are holding the next @ApacheDataFusio meetup next Wednesday Nov 12 in Boston.
luma.com
Join us for an evening of talks, panel discussion, and community discussion about Apache DataFusion and its growing role in modern data infrastructure. This…
0
3
14
If anyone wants to know why Xiangpeng Hao is a great mentor, they can read this response: https://t.co/Zvu9c046DJ
0
4
90