andrewlamb1111 Profile Banner
Andrew Lamb Profile
Andrew Lamb

@andrewlamb1111

Followers
3K
Following
406
Media
99
Statuses
746

Apache {DataFusion, Arrow} PMC, Database Engineer

Joined November 2020
Don't wanna be here? Send us removal request.
@andrewlamb1111
Andrew Lamb
3 days
Prateek Gaur and co at @Snowflake reproduced the (great) results for the ALP encoding algorithm from @cwi_da / @afroozeh3 / @peterabcz. ALP achieves ZSTD levels of compression and much faster decode. We are discussing adding it to @ApacheParquet: https://t.co/gxwF5QqtNO
0
10
76
@andrewlamb1111
Andrew Lamb
5 days
@SpiralDB @CMUDB @ApacheParquet The idea of using WASM as a forward compatibility mechanism I thought was especially neat
0
0
5
@andrewlamb1111
Andrew Lamb
5 days
The talk on @SpiralDB at @CMUDB https://t.co/6mRfsnDZiP is a great one. I think it would also be interesting to hear a counterpoint about @ApacheParquet that explains actual technical details of that format, the Cathedral vs Bizzaar management, options with Metadata, etc
1
15
112
@andrewlamb1111
Andrew Lamb
10 days
Our new thrift parser in the Rust @ApacheParquet implementation is a 🎁 that keeps on giving performance wise 🚀 https://t.co/b6lHJbxQzd We are also working on a blog post that has a deeper explanation
2
8
137
@andrewlamb1111
Andrew Lamb
11 days
Yesterday I learned about the SpatialBench from Sedona https://t.co/I7MYOptkuK Which they based on our tpchgen-rs project: https://t.co/PR0F0AS9SD (BTW I a still looking for some more github watchers on tpchgen-rs so I can get it on homebrew)
0
2
32
@andrewlamb1111
Andrew Lamb
13 days
BTW if anyone wants a good intro to database storage / Log structured storage (aka LSM trees), the @CMUDB lecture this fall is a good one:
0
30
281
@andrewlamb1111
Andrew Lamb
18 days
It starts: https://t.co/0fhieCL0BX clfushopt is going to make the worlds fastest tpc-ds generator
Tweet card summary image
github.com
WIP (out of tree) Rust implementation of TPC-DS generators. - clflushopt/tpcdsgen
2
3
31
@andrewlamb1111
Andrew Lamb
19 days
I am proud to announce I am now a committer on the @ApacheParquet project. Realistically this likely means more reviews / helping clarify the parquet specs, but I also hope to help more actively evolve the format, especially around new encodings. https://t.co/lnR71Po1yA
5
2
109
@RitchieVink
Ritchie Vink
21 days
I am really proud to announce that we raised €18M in series A. We have got big plans on improving Polars. Great things to come!
@DataPolars
polars data
21 days
We raised €18M in Series A led by @Accel to build fast data processing at any scale. All on Polars. https://t.co/Qy13YezymD
5
2
63
@andrewlamb1111
Andrew Lamb
21 days
DataFusion 50 is released. Read all about it here: https://t.co/B2OHkMPL3h
3
5
62
@andrewlamb1111
Andrew Lamb
25 days
@ApacheParquet @ApacheDataFusio Check out the follow on from @jcsherin who used these techniques to put full text indexes in parquet:
1
0
7
@andrewlamb1111
Andrew Lamb
25 days
So cool: @jcsherin added full text indexes into Parquet files using the techniques from our blog https://t.co/t0eDGHeG9c
4
9
50
@andrewlamb1111
Andrew Lamb
26 days
"Introducing SedonaDB: A single-node analytical database engine with geospatial as a first-class citizen" Built in Rust with @ApacheDataFusio https://t.co/bsneiAJFRv
0
4
32
@andrewlamb1111
Andrew Lamb
1 month
We just published an easier to find list of all PMC and committers on @ApacheDataFusio, and it is quite a cool list of people and affiliations if I do say so myself 🤗 https://t.co/OOYNgf58eZ
1
5
33
@andrewlamb1111
Andrew Lamb
1 month
It was a great time on Monday at the @ApacheDataFusio meetup in NYC. We heard about distributed query plans, filter pushdown, geospatial support, and VegaFusion. More deets here https://t.co/Axugrv05P3
1
2
26
@andrewlamb1111
Andrew Lamb
1 month
6 hours to generate TPCH SF750000 dataset using a worker pool of 1000 parallel processes (spread across 25 VMs). BTW SF750000 is 750TB raw / 220 TB parquet. https://t.co/Piqq2ubVIw
Tweet card summary image
github.com
Tracking Issue for v2.0.0 Open issues related to v2.0.0 Memory size growth (#76 & #150) #152 #146 #80 (not sure if we want to include this one) #145 We dont have to resolve all of these, I thin...
1
3
26