New blog post by
@mraasveldt
: Multi-Database Support in DuckDB
DuckDB can now attach MySQL, Postgres, and SQLite databases in addition to databases stored in its own format. This allows data to be read into DuckDB and moved between these systems in a
DuckDB was recently covered in
@andy_pavlo
's Advanced Database Systems course at CMU. The lecture covers DuckDB's history, internals, and integration with other systems.
Slides:
Recording:
We are proud to release DuckDB v0.10.0:
Some highlights:
– A reworked and much faster CSV reader
– Fixed-length arrays
– Multi-database support
– Secrets manager
– Temporary memory manager
– Adaptive lossless floating-point compression
– New CLI editor
–
DuckDB 0.7.0 "Labradorius" released with
#JSON
support, parallel and partitioned export to CSV and Parquet, UPSERT,
@DataPolars
integration, and much more in our release announcement blog post:
DuckDB is introducing support for vector similarity search through the new VSS extension.
Read
@Maxxen_
's blog post for a sneak preview on the new extension's capabilities:
We wrote a performance guide for DuckDB users! This guide covers topics such as the effects of schema (constraints, indexing) and hardware (CPU, memory, disk). We also share best practices for querying Parquet files and tips for tuning your workload.
New blog post by
@lnkuiper
– No Memory? No Problem. External Aggregation in DuckDB
The post describes how DuckDB can efficiently aggregate over many more groups than fit in memory, allowing it to complete the 50 GB variant of the
DuckDB's co-creator, Hannes Mühleisen, recently became a professor of data engineering at Radboud University. The recording of his inaugural lecture, titled "The Ancient Art of Data Management", is now available.
DuckDB 0.6.0 "Oxyura" released with improved storage, higher performance for CSV loading and indexing, new SQL syntax, better memory management, shell tweaks and so many new features
@mraasveldt
wrote a separate blog post to explain it all:
DuckDB supports querying buckets in the AWS S3 Express One Zone. Read the related guide at , which shows that DuckDB can read a Parquet file from an S3 Express One bucket at about 1.2 gigabytes per second!
PS: You may also noticed that we started rolling
New blog post by
@__AlexMonahan__
SQL Gymnastics: Bending SQL into flexible new shapes
In this post, Alex presents pure SQL queries to implement dynamic groupings and aggregate functions using DuckDB's friendly SQL extensions. The queries can be used to
We have revamped one of our core operators, aggregation. It has improved scalability for many unique groups and for a large number of cores. Thanks to these, you can expect better performance when running large aggregations on big machines.
The Awesome DuckDB repository, maintained by
@davidgasquez
, has grown to more than 100 entries in less than a year. If you are aware of more cool projects using DuckDB, please consider submitting a PR!
Did you know that you can connect to a DuckDB database file via HTTPS or S3 with just two SQL statements? We have a new guide that explains how to do this.
New blog post by
@hfmuehleisen
duckplyr: dplyr powered by DuckDB
The post describes the new R package duckplyr, which translates the dplyr API to DuckDB’s execution engine.
Read more at
There are now a lot of handy tools and cool projects built around DuckDB. You can find a list of these in the Awesome DuckDB repository maintained by
@davidgasquez
.
See the list and contribute your project at
New blog post: JupySQL enables SQL cells in Jupyter, supports DuckDB, and also enables plotting larger than memory datasets using DuckDB! JupySQL is an active fork of ipython-sql being enhanced by the folks at
@ploomber
. Let us know what you think!
The new DuckDB landing page, , has several code snippets for SQL features and DuckDB's APIs. You can use the "Live Demo" button to execute the queries on an example dataset in your browser using the DuckDB shell that runs in WebAssembly.
Note: the demo
We have started publishing the recordings of DuckCon
#4
. We are first releasing the “State of the Duck” talk by DuckDB's co-creators, Hannes Mühleisen (
@hfmuehleisen
) and Mark Raasveldt (
@mraasveldt
).
Video:
Slides:
Special thanks
Lambda functions are one of the most popular features in DuckDB. We recently added list_reduce, a new scalar function that supports lambdas, and they got their own documentation page at .
Note that this feature is currently only available in DuckDB's
New blog post by
@carlo_piovesan
: Extensions for DuckDB-Wasm
Thanks to recent developments, DuckDB-Wasm users can now load DuckDB extensions, allowing them to run extensions in the browser.
DuckDB was included in
@InfoWorlds
's best open-source software list as a "tiny-but-powerful project" that provides just enough OLAP for most use cases. The award praised the lightweight nature and many features of DuckDB.
This blog post is a short summary of the ICDE 2024
(
@icdeconf
) paper authored by
@lnkuiper
,
@peterabcz
, and
@hfmuehleisen
: Robust External Hash Aggregation in the Solid State Age.
The paper is available at
New blog post by
@lnkuiper
– No Memory? No Problem. External Aggregation in DuckDB
The post describes how DuckDB can efficiently aggregate over many more groups than fit in memory, allowing it to complete the 50 GB variant of the
New post by
@lnkuiper
: Shredding deeply nested
#JSON
one vector at a time
Querying JSON as a table is as easy as SELECT * FROM 'file.json';
It's fast too, thanks to DuckDB's lists/structs and the yyjson parser by
@ibireme
.
We rolled out an updated syntax highlighter and a new color scheme in the DuckDB documentation, .
The highlighter now knows all of DuckDB's keywords and functions. The color scheme is based in the Bluloco theme ().
Did you know that DuckDB supports function chaining? This allows function calls to be rewritten in more a readable manner. See the Even Friendlier SQL with DuckDB blog post for details:
We extended our performance guide with a new recommendation: avoid joining on VARCHAR-typed columns (i.e., strings). The accompanying microbenchmark demonstrates a case where performing a large join on BIGINT columns is 2.6× faster than evaluating the same join on VARCHAR
DuckDB's co-creator Hannes Mühleisen gave a talk this week at the Hasso-Plattner-Institut
@HPI_DE
titled "Two Tier Architectures are Anachronistic". The recording is now available online.
We have released DuckDB v0.10.3, a bugfix release.
The command 'pip install duckdb --upgrade' already delivers the new version. DuckDB clients in other package management systems (CRAN, Maven, Homebrew, etc.) will be updated in the coming days.
For the release notes and binary
DuckCon
#4
will feature a talk by
@polinaeterna
of
@huggingface
titled “Hugging a Duck: democratizing data access and exploration with DuckDB and Hugging Face Hub”.
The talk will explain how they use DuckDB to allow people to easily explore over 250k public dataset on the
A reminder: DuckDB has a tldr page. If you have
@tldr_pages
installed, you can get examples of the most common command-line arguments by running
$ tldr duckdb
We have released DuckDB v0.10.1, a bugfix release. For installation instructions, see:
This release fixes several issues with the CSV parser and tackles scenarios which previously resulted in out-of-memory (OOM) errors (details in 🧵).
We held DuckCon
#4
today in Amsterdam. Thanks to all speakers and attendees for making this an amazing event, and to
@RillData
for sponsoring the drinks & snacks!
The speaker decks are available on the event's site:
The recordings will be published in
DuckDB's DevRel,
@szarnyasg
, gave a talk last November at the
@oredev
conference titled "DuckDB: Harnessing in-process analytics for data science and beyond". The recording is now available:
The slide deck is here:
DuckCon
#4
is next week in Amsterdam, on Feb 2 (Friday). Subash Roul from
@fivetran
is going to talk about building data lakes using DuckDB.
See the rest of the talks and the registration link at
A new DuckDB article is out on Datanami with quotes from
@hfmuehleisen
:
DuckDB Walks to the Beat of Its Own Analytics Drum
“DuckDB has this different angle,” Mühleisen said. “It’s more like something that you put into a workflow rather than something
Laurens Kuiper will present his paper "Robust External Hash Aggregation in the Solid State Age" tomorrow at ICDE 2024 in Utrecht. This work describes the techniques that make larger-than-memory aggregation possible in DuckDB. The paper is co-authored by Peter Boncz (
@peterabcz
)
We have released DuckDB v0.9.1 today. This is a bug fix release for various issues discovered after we released 0.9.0. There are no new features, just bug fixes. Database files created by DuckDB v0.9.0 can be read by DuckDB v0.9.1.
The second talk at DuckCon
#4
was presented by Polina Kazakova (
@polinaeterna
) of Hugging Face (
@huggingface
) with the title “Hugging a Duck: Democratizing Data Access and Exploration with DuckDB and Hugging Face Hub”.
Video:
Slides:
New blog post by
@samansmink
: Dependency Management in DuckDB Extensions
TL;DR: While core DuckDB has zero external dependencies, building extensions with dependencies is now very simple, with built-in support for vcpkg.
We are excited to invite you all for our second "DuckCon" user group meeting. It will take place the day before
@fosdem
in Brussels on Friday Feb 3rd, 2023. In-person pandemic permitting.
We are launching a user survey to learn more about how the community uses DuckDB! Please visit the form at
The survey takes about 5 minutes and is anonymous. However, you can enter your email address to participate in a raffle where we give away DuckDB
Last week we had a blast in San Francisco at DuckCon
#3
. The talks are now available online. Special thanks to
@mehd_io
for creating these recordings!
We have released DuckDB v0.9.2 today. This is a bug fix release. Database files created by both DuckDB v0.9.0 and v0.9.1 can be read by DuckDB v0.9.2, and vice versa.
DuckDB's co-creator Hannes Mühleisen (
@hfmuehleisen
) will give an invited talk on Wednesday May 15th at 17:00. The talk, presented at the industry track of ICDE 2024 in Utrecht, is titled "How I Learned to Stop Worrying About Benchmarks".
DuckDB's release calendar was recently extended with pictogram of the ducks used for the release codenames. Check it out at
PS: The next version of DuckDB, v0.10.3 is expected to be released on May 20.
Save the date: On 2024-02-02 we will hold DuckCon
#4
in Amsterdam, the birthplace of DuckDB. This date is the Friday before the
@fosdem
conference in Brussels. Please register and consider submitting a lightning talk proposal!