Unnesting Columns
[Note] unnesting is an experimental feature for TidierDB
TidierDB now supports unnesting both arrays and structs
using TidierDB
db = connect(duckdb())
DuckDB.query(db, "
CREATE OR REPLACE TABLE nested_table (
id INTEGER,
coord ROW(lat DOUBLE, lon DOUBLE),
loc ROW(city STRING, country STRING),
info ROW(continent STRING, climate STRING)
);
INSERT INTO nested_table VALUES
(1, ROW(40.7128, -74.0060), ROW('New York', 'USA'), ROW('North America', 'Temperate')),
(2, ROW(48.8566, 2.3522), ROW('Paris', 'France'), ROW('Europe', 'Temperate')),
(3, ROW(35.6895, 139.6917), ROW('Tokyo', 'Japan'), ROW('Asia', 'Humid Subtropical'));");
@unnest_wider
¤
@unnest_wider
at this time only supports unnesting columns of type STRUCT
@chain dt(db, "nested_table") begin
@unnest_wider(coord:info)
@collect
end
Row | id | lat | lon | city | country | continent | climate |
---|---|---|---|---|---|---|---|
Int32 | Float64 | Float64 | String | String | String | String | |
1 | 1 | 40.7128 | -74.006 | New York | USA | North America | Temperate |
2 | 2 | 48.8566 | 2.3522 | Paris | France | Europe | Temperate |
3 | 3 | 35.6895 | 139.692 | Tokyo | Japan | Asia | Humid Subtropical |
Single elements can be extracted a new column like so, or with any of existing LIST backend function as well.
@chain dt(db, "nested_table") begin
@mutate(city = loc.city)
@collect
end
Row | id | coord | loc | info | city |
---|---|---|---|---|---|
Int32 | NamedTup…? | NamedTup…? | NamedTup…? | String | |
1 | 1 | (lat = 40.7128, lon = -74.006) | (city = "New York", country = "USA") | (continent = "North America", climate = "Temperate") | New York |
2 | 2 | (lat = 48.8566, lon = 2.3522) | (city = "Paris", country = "France") | (continent = "Europe", climate = "Temperate") | Paris |
3 | 3 | (lat = 35.6895, lon = 139.692) | (city = "Tokyo", country = "Japan") | (continent = "Asia", climate = "Humid Subtropical") | Tokyo |
Create a new table for with columns of arrays for the following example with @unnest_longer
DuckDB.query(db, "
CREATE TABLE nt (
id INTEGER,
data ROW(a INTEGER[], b INTEGER[])
);
INSERT INTO nt VALUES
(1, (ARRAY[1,2], ARRAY[3,4])),
(2, (ARRAY[5,6], ARRAY[7,8,9])),
(3, (ARRAY[10,11], ARRAY[12,13]));");
@unnest_longer
¤
In this example, we will first @unnest_wider
data column into 2 columns a
and b
, before flattening the arrays within them with @unnest_longer
@chain dt(db, "nt") begin
@unnest_wider(data)
@unnest_longer(a, b)
@collect
end
Row | id | a | b |
---|---|---|---|
Int32 | Int32? | Int32 | |
1 | 1 | 1 | 3 |
2 | 1 | 2 | 4 |
3 | 2 | 5 | 7 |
4 | 2 | 6 | 8 |
5 | 2 | missing | 9 |
6 | 3 | 10 | 12 |
7 | 3 | 11 | 13 |
Exploded JSON¤
Users may come across table columns that do not unnest with @unnest_wider
and @unnest_longer
. TidierDB now has an experimental method to enable using DuckDB's UNNEST
function with transmute to explode such a column and drop all other columns.
@chain dt(db, "read_json('https://environment.data.gov.uk/flood-monitoring/id/measures')", alias = "meas") begin
@transmute(unnest(items))
@head 3
@collect
end
Row | @id | datumType | label | latestReading | notation | parameter | parameterName | period | qualifier | station | stationReference | unit | unitName | valueType |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
String | String | String | NamedTup…? | String | String | String | Int64 | String | String | String | String | String | String | |
1 | http://environment.data.gov.uk/flood-monitoring/id/measures/1029TH-level-downstage-i-15_min-mASD | http://environment.data.gov.uk/flood-monitoring/def/core/datumASD | RIVER DIKLER AT BOURTON ON THE WATER - level-downstage-i-15_min-mASD | (var"@id" = "http://environment.data.gov.uk/flood-monitoring/data/readings/1029TH-level-downstage-i-15_min-mASD/2025-03-10T00-00-00Z", date = Date("2025-03-10"), dateTime = DateTime("2025-03-10T00:00:00"), measure = "http://environment.data.gov.uk/flood-monitoring/id/measures/1029TH-level-downstage-i-15_min-mASD", value = -0.173) | 1029TH-level-downstage-i-15_min-mASD | level | Water Level | 900 | Downstream Stage | http://environment.data.gov.uk/flood-monitoring/id/stations/1029TH | 1029TH | http://qudt.org/1.1/vocab/unit#Meter | mASD | instantaneous |
2 | http://environment.data.gov.uk/flood-monitoring/id/measures/1029TH-level-stage-i-15_min-mASD | http://environment.data.gov.uk/flood-monitoring/def/core/datumASD | RIVER DIKLER AT BOURTON ON THE WATER - level-stage-i-15_min-mASD | (var"@id" = "http://environment.data.gov.uk/flood-monitoring/data/readings/1029TH-level-stage-i-15_min-mASD/2025-03-10T00-00-00Z", date = Date("2025-03-10"), dateTime = DateTime("2025-03-10T00:00:00"), measure = "http://environment.data.gov.uk/flood-monitoring/id/measures/1029TH-level-stage-i-15_min-mASD", value = 0.27) | 1029TH-level-stage-i-15_min-mASD | level | Water Level | 900 | Stage | http://environment.data.gov.uk/flood-monitoring/id/stations/1029TH | 1029TH | http://qudt.org/1.1/vocab/unit#Meter | mASD | instantaneous |
3 | http://environment.data.gov.uk/flood-monitoring/id/measures/E2043-level-stage-i-15_min-mASD | http://environment.data.gov.uk/flood-monitoring/def/core/datumASD | SURFLEET SLUICE WITS - level-stage-i-15_min-mASD | (var"@id" = "http://environment.data.gov.uk/flood-monitoring/data/readings/E2043-level-stage-i-15_min-mASD/2025-03-10T00-00-00Z", date = Date("2025-03-10"), dateTime = DateTime("2025-03-10T00:00:00"), measure = "http://environment.data.gov.uk/flood-monitoring/id/measures/E2043-level-stage-i-15_min-mASD", value = 0.488) | E2043-level-stage-i-15_min-mASD | level | Water Level | 900 | Stage | http://environment.data.gov.uk/flood-monitoring/id/stations/E2043 | E2043 | http://qudt.org/1.1/vocab/unit#Meter | mASD | instantaneous |
This page was generated using Literate.jl.