Andrew Lamb @andrewlamb1111 X Profile

Andrew Lamb

@andrewlamb1111

Followers

4K

Following

434

Media

111

Statuses

783

Apache {DataFusion, Arrow} PMC, Database Engineer

https://t.co/OCgMKhnyD8

Joined November 2020

Don't wanna be here? Send us removal request.

Andrew Lamb

@andrewlamb1111

5 days

Qiwei Huang explains how we use Late Materialization (LM) in the Apache Rust Parquet reader to accelerate filtering. LM can describe several techniques, but this is a core one (also applies to joins, Top-K, etc) https://t.co/F4Lqc29IeH

6

12

94

Julien Le Dem

@J_

5 days

In the past few years, we’ve seen a cambrian explosion of new columnar formats, challenging the hegemony of Parquet. Presumably, the design of yore is not going to cut it moving forward. I spent some time to understand how things actually changed. https://t.co/BxtXabynQ8

sympathetic.ink

In the past few years, we’ve seen a cambrian explosion of new columnar formats, challenging the hegemony of Parquet: Lance, Fastlanes, Nimble, Vortex, AnyBlox, F3 (File Format for the Future). The...

3

15

88

Gunnar Morling 🌍

@gunnarmorling

5 days

"Column Storage for the AI Era" Enjoyed reading this post about the current state and future of #Parquet, by Julien Le Dem (@J_). New encodings, championed by new columnar formats like Lance, taking advantage of SIMD, etc. Exciting times. 👉 https://t.co/uuYRrnbJqf

4

21

135

Andrew Lamb

@andrewlamb1111

7 days

Thanks to https://t.co/jc4P72tdmO, we are hosting a DataFusion meetup in Stockholm Date: Thursday March 5, 2026: 17:30 - 20:00 Signup:

luma.com

Join us for an evening of talks, panel discussions, and community discussions about Apache DataFusion and its growing role in modern data infrastructure. This…

0

2

14

Andrew Lamb

@andrewlamb1111

9 days

There is some crazy (good) activity on the @ApacheParquet mailing list for new encodings. A sample: PFOR, FSST, ALP, Strings and Cascaded Encodings. 🤯 Huge kudos to Arnav Balyan, Prateek Gaur, and Micah Kornfield for driving this. https://t.co/YRE13QEm2j

1

7

61

Andrew Lamb

@andrewlamb1111

13 days

Here are the slides and recordings from our Boston DataFusion Meetup in September: Youtube: https://t.co/TGgK79oKjd Slides (pdf): https://t.co/kSKpuTFRux

1

6

47

Andrew Lamb

@andrewlamb1111

21 days

DataFusion 51.0.0 Release Notes: https://t.co/Si4f1g4JNE

1

4

46

GOTO

@GOTOcon

23 days

Why rebuild the wheel? @olimpiupop talks with @andrewlamb1111 about how Apache Arrow, Parquet, and the FDAP stack are letting database teams focus on innovation instead of reinventing the basics. https://t.co/9EfG3SgcXq

0

1

4

Andrew Lamb

@andrewlamb1111

23 days

Building Modern Databases with the FDAP Stack • Andrew Lamb & Olimpiu Pop • GOTO 2025 https://t.co/R5ty8GtDdo

0

7

68

Andrew Lamb

@andrewlamb1111

24 days

Does anyone know a good academic / industrial overview of how to implement (not use) LATERAL joins in SQL? It keeps coming up in @ApacheDataFusio and I need to get reasonable background on it. https://t.co/8mZXmOxDuP

5

2

38

Andrew Lamb

@andrewlamb1111

24 days

Save the date -- Wednesday July 22, 2026 for the first Apache DataFusion meetup in Denver:

luma.com

Join us for an evening of talks, panel discussion, and community discussion about Apache DataFusion and its growing role in modern data infrastructure. We will…

0

7

Andrew Lamb

@andrewlamb1111

25 days

Jobs at F5 for working in Arrow and @ApacheDataFusio :

0

1

5

Andrew Lamb

@andrewlamb1111

1 month

One fun nugget from the Boston @ApacheDataFusio meetup on Wednesday: DataDog reports they run 68+million queries per hour with DataFusion

2

4

52

Andrew Lamb

@andrewlamb1111

1 month

Coming soon in arrow-rs: adaptive predicate pushdown https://t.co/72KRzneR8e (aka we are really close to turning on late materialization by default in parquet scans in @ApacheDataFusio )

github.com

Which issue does this PR close? Closes [Parquet]Performance Degradation with RowFilter on Unsorted Columns due to Fragmented ReadPlan #8565 Closes Adaptive Parquet Predicate Pushdown Evaluation #5...

0

10

73

Andrew Lamb

@andrewlamb1111

1 month

An interesting take on the composable data stack: a unified distributed execution framework (Sail) will drive innovation in distributed systems implementation the way @ApacheDataFusio et. al have been driving innovation in the computation layer

Shehab Amin

@shehab_amins

1 month

For too long, the composable data stack has lacked a solid distributed compute layer. Our latest blog post covers why we believe Sail is the last missing piece, definitely worth a read. Read the full post:

0

4

33

Andrew Lamb

@andrewlamb1111

1 month

Here is a nice examination of the benefits of building new systems using the extensibility of @ApacheDataFusio vs other systems.

Jacopo Tagliabue

@jacopotagliabue

1 month

In the eternal struggle between Good vs Evil, Blur vs Oasis, and @duckdb vs @ApacheDataFusio , we just switched to DataFusion after 18 months, while keeping our #FaaS magic intact: https://t.co/wF2GsLkPHq

0

7

66

Jacopo Tagliabue

@jacopotagliabue

1 month

In the eternal struggle between Good vs Evil, Blur vs Oasis, and @duckdb vs @ApacheDataFusio , we just switched to DataFusion after 18 months, while keeping our #FaaS magic intact: https://t.co/wF2GsLkPHq

1

14

82