🛢️
DuckDB
On this page
DuckDB is a high-performance analytical database system. It is designed to be fast, reliable, portable, and easy to use. DuckDB provides a rich SQL dialect, with support far beyond basic SQL. DuckDB supports arbitrary and nested correlated subqueries, window functions, collations, complex types (arrays, structs), and more.
Why DuckDB
- simple
- portable
- fast
- feature-rich
- extensible
- free
Install
macOS
zsh
brew install duckdb
Usages
sql
duckdb /path/to/your/database.duckdb
Describe the table
sql
DESCRIBE SELECT * FROM events;
JSON
sql
SELECT * FROM 'todos.json';
zsh
duckdb -json -c \"select license->>'key' as license, count(*) as count \from read_json('https://api.github.com/orgs/golang/repos') \group by 1 \order by count desc" \| jq
sql
SELECT type, count(*) countFROM 'gharchive_gz/*.json.gz'GROUP BY typeORDER BY count DESC;
Parquet - A Column-Oriented Data File Format
Apache Parquet is an open-source data file format specifically designed for efficient data storage and retrieval, particularly optimized for handling complex data in bulk.
sql
CREATE TABLE test AS SELECT * FROM read_parquet('test.parquet');
ts
// load local parquet files into memoryconst loadData = async (name: string | number) => {const url = `data/${name}.parquet`;const res = await fetch(url);await db.registerFileBuffer(`data_${name}.parquet`,new Uint8Array(await res.arrayBuffer()));};console.time("load data");// Parquet demo data listhttps://github.com/duckdb/duckdb-data/releases/tag/v1.0await Promise.all([4, 5, 6].map(loadData));console.timeEnd("load data");const result = await conn.query(`SELECTCOUNT(*) as countFROMparquet_scan('data_*.parquet')`);console.timeEnd("count");console.log("count: ", result.toArray()[0].count);
Geospatial
Turn a row into a geojson object
sql
-- install the spatial extensionINSTALL spatial;LOAD spatial;CREATE TABLE locations (id INTEGER PRIMARY KEY,name VARCHAR,longitude DOUBLE,latitude DOUBLE)INSERT INTO locations (id, name, longitude, latitude) VALUES(1, 'Location A', 40.7128, -74.0060),(2, 'Location B', 34.0522, -118.2437),(3, 'Location C', 51.5074, -0.1278);-- spatial extension waySELECTjson_object('type', 'Feature','properties', json_object('name', name),'geometry', ST_AsGeoJSON(ST_Point(longitude, latitude))::JSON) AS geojsonFROM locations-- non spatial extension waySELECTjson_object('type', 'Feature','geometry', json_object('type', 'Point','coordinates', [longitude, latitude]),'properties', json_object('name', name)) AS geojsonFROM locations
To get the centroid
sql
CREATE TABLE geojson_data AS SELECT ST_GeomFromGeoJSON('{"type":"Polygon","coordinates":[[[0,0],[10,0],[10,10],[0,10],[0,0]]]}') AS geom;SELECT ST_AsGeoJSON(ST_Centroid(geom)) AS centroid_geojson FROM geojson_data;-- Get the longitude and latitudeSELECTST_X(ST_Centroid(geom)) AS longitude,ST_Y(ST_Centroid(geom)) AS latitudeFROM geojson_data;
Resources
- serverless-duckdb: An example of how to run DuckDB on AWS Lambda & API Gateway.
Tools
- VS Code Parquet Explorer
- SQLTools
- DuckDB Driver for SQLTools
- duckdb-nodejs-layer:Packaging DuckDB for usage in AWS Lambda functions with Node.js, and publishing as public Lambda layers.
- DB Pilot