Max Gabrielsson
@Maxxen_
Followers
305
Following
222
Media
13
Statuses
222
Software Engineer @ DuckDB Labs Compiler/Database/Web/Distributed/Low-Level/Constraint/Geospatial/Parallel/Game/Systems/All-Around Programming
Joined June 2012
Today we're launching DuckLake, an integrated data lake and catalog format powered by SQL. DuckLake unlocks next-generation data warehousing where compute is local, consistency central, and storage scales till infinity. DuckLake is an open standard and we've implemented it in
21
198
688
Or ducked in general. You're losing so much of the benefits of vectorized execution if you trash the cache with allocator calls - not to mention the contention caused when allocating from different threads.
1
0
1
And this is after spending all night trying to optimize this - removing recursion from deserialization, caching already constructed geometries, pooling vectors etc etc. This is why I'm so pedantic about trying to minimize allocations in the context of duckdb-spatial.
1
0
2
Here are some interesting profiling results. Having DuckDB executing a large join with the new spatial join operator, it spends about 50% of the time just _deserializing_ and constructing GEOS geometries, 15% just calling GEOS destructors, and just 2% evaluating the predicate 🥲
2
1
41
Its happening y'all, DuckDB's spatial joins are about to get good, actually. https://t.co/o3M7TMUlV0
github.com
This PR updates the workflow and pinned DuckDB version in the v1.2.2 branch to DuckDB v1.2.2. It also adds a new physical operator, the SPATIAL_JOIN. Spatial Joins Executing spatial joins in DuckDB...
2
14
118
Experimenting with CRS tracking support in DuckDB Spatial 🗺️👀
1
5
19
New blog post: Query Engines: Gatekeepers of the Parquet File Format In this post, Laurens Kuiper argues that we are wasting a lot of bits by not using the Parquet format to its full extent – a limitation caused by the lack of support for Parquet features in some systems.
5
39
194
DuckDB has a new Node.js client with more features and a more idiomatic API. Read the guest blog post by Jeff Raymakers (MotherDuck) for more details: https://t.co/KlVTPMbQ37
1
8
57
A few weeks ago, we got curious – how well does DuckDB perform on phones? We ventured to find this out by running the TPC-H SF100 queries on both Android and iPhone. Read more in our latest blog post: https://t.co/UdjtXP5k7X
3
10
69
Oh yeah, Im on the other site as well! https://t.co/QLNIMVy1NN
0
0
1
𝗙𝗮𝘀𝘁𝗲𝗿 𝘁𝗼𝗽 𝗡 𝗳𝗼𝗿 𝗮𝗴𝗴𝗿𝗲𝗴𝗮𝘁𝗲 𝗳𝘂𝗻𝗰𝘁𝗶𝗼𝗻𝘀 The end of the week brings us another blog post. @__AlexMonahan__ takes a deep dive into the top N capabilities in DuckDB: https://t.co/LsfbZ6fN3Q Happy Friday!
duckdb.org
Find the top N values or filter to the latest N rows more quickly and easily with the N parameter in the min, max, min_by, and max_by aggregate functions.
0
15
89
𝗨𝗽𝗱𝗮𝘁𝗲𝘀 𝗼𝗻 𝗩𝗦𝗦 𝗘𝘅𝘁𝗲𝗻𝘀𝗶𝗼𝗻 In this week’s blog post, Max Gabrielsson dives into the details of the new features and improvements in the Vector Similarity Search Extension:
duckdb.org
DuckDB is another step closer to becoming a vector database! In this post, we show the new performance optimizations implemented in the vector search extension.
0
7
59
𝐔𝐩𝐝𝐚𝐭𝐞𝐝 𝐛𝐞𝐧𝐜𝐡𝐦𝐚𝐫𝐤 𝐬𝐮𝐢𝐭𝐞 In this week’s blog post, @holanda_pe takes a deep dive into DuckDB’s CSV performance with the well-known NYC Taxi Dataset:
duckdb.org
DuckDB's benchmark suite now includes the NYC Taxi Benchmark. We explain how our CSV reader performs on the Taxi Dataset and provide steps to reproduce the benchmark.
0
9
51
New blog post by @__AlexMonahan__: DuckDB in Python in the Browser with Pyodide, PyScript, and JupyterLite In this post, Alex explains how you can set up a fully in-browser DuckDB notebook in seconds using Pyodide. https://t.co/rC94gWpMRr
2
27
145
New blog post in collaboration with @ncclementi on using Ibis + DuckDB + Lonboard for analyzing and visualizing large geospatial data https://t.co/fx2v4ltQ3P
1
27
119