Skip to content

Unnesting Columns

[Note] unnesting is an experimental feature for TidierDB

TidierDB now supports unnesting both arrays and structs

using TidierDB
db = connect(duckdb())

DuckDB.query(db, "
        CREATE OR REPLACE TABLE nested_table (
            id INTEGER,
            coord ROW(lat DOUBLE, lon DOUBLE),
            loc ROW(city STRING, country STRING),
            info ROW(continent STRING, climate STRING)
        );
        INSERT INTO nested_table VALUES
            (1, ROW(40.7128, -74.0060), ROW('New York', 'USA'), ROW('North America', 'Temperate')),
            (2, ROW(48.8566, 2.3522), ROW('Paris', 'France'), ROW('Europe', 'Temperate')),
            (3, ROW(35.6895, 139.6917), ROW('Tokyo', 'Japan'), ROW('Asia', 'Humid Subtropical'));");

@unnest_wider¤

@unnest_wider at this time only supports unnesting columns of type STRUCT

@chain dt(db, "nested_table") begin
    @unnest_wider(coord:info)
    @collect
end
3×7 DataFrame
Rowidlatloncitycountrycontinentclimate
Int32Float64Float64StringStringStringString
1140.7128-74.006New YorkUSANorth AmericaTemperate
2248.85662.3522ParisFranceEuropeTemperate
3335.6895139.692TokyoJapanAsiaHumid Subtropical

Single elements can be extracted a new column like so, or with any of existing LIST backend function as well.

@chain dt(db, "nested_table") begin
    @mutate(city = loc.city)
    @collect
end
3×5 DataFrame
Rowidcoordlocinfocity
Int32NamedTup…?NamedTup…?NamedTup…?String
11(lat = 40.7128, lon = -74.006)(city = "New York", country = "USA")(continent = "North America", climate = "Temperate")New York
22(lat = 48.8566, lon = 2.3522)(city = "Paris", country = "France")(continent = "Europe", climate = "Temperate")Paris
33(lat = 35.6895, lon = 139.692)(city = "Tokyo", country = "Japan")(continent = "Asia", climate = "Humid Subtropical")Tokyo

Create a new table for with columns of arrays for the following example with @unnest_longer

DuckDB.query(db, "
    CREATE TABLE nt (
        id INTEGER,
        data ROW(a INTEGER[], b INTEGER[])
        );
    INSERT INTO nt VALUES
        (1, (ARRAY[1,2], ARRAY[3,4])),
        (2, (ARRAY[5,6], ARRAY[7,8,9])),
        (3, (ARRAY[10,11], ARRAY[12,13]));");

@unnest_longer¤

In this example, we will first @unnest_wider data column into 2 columns a and b, before flattening the arrays within them with @unnest_longer

@chain dt(db, "nt") begin
    @unnest_wider(data)
    @unnest_longer(a, b)
    @collect
end
7×3 DataFrame
Rowidab
Int32Int32?Int32
1113
2124
3257
4268
52missing9
631012
731113

Exploded JSON¤

Users may come across table columns that do not unnest with @unnest_wider and @unnest_longer. TidierDB now has an experimental method to enable using DuckDB's UNNEST function with transmute to explode such a column and drop all other columns.

@chain dt(db, "read_json('https://environment.data.gov.uk/flood-monitoring/id/measures')", alias = "meas") begin
    @transmute(unnest(items))
    @head 3
    @collect
end
3×14 DataFrame
Row@iddatumTypelabellatestReadingnotationparameterparameterNameperiodqualifierstationstationReferenceunitunitNamevalueType
StringStringStringNamedTup…?StringStringStringInt64StringStringStringStringStringString
1http://environment.data.gov.uk/flood-monitoring/id/measures/1029TH-level-downstage-i-15_min-mASDhttp://environment.data.gov.uk/flood-monitoring/def/core/datumASDRIVER DIKLER AT BOURTON ON THE WATER - level-downstage-i-15_min-mASD(var"@id" = "http://environment.data.gov.uk/flood-monitoring/data/readings/1029TH-level-downstage-i-15_min-mASD/2025-03-10T00-00-00Z", date = Date("2025-03-10"), dateTime = DateTime("2025-03-10T00:00:00"), measure = "http://environment.data.gov.uk/flood-monitoring/id/measures/1029TH-level-downstage-i-15_min-mASD", value = -0.173)1029TH-level-downstage-i-15_min-mASDlevelWater Level900Downstream Stagehttp://environment.data.gov.uk/flood-monitoring/id/stations/1029TH1029THhttp://qudt.org/1.1/vocab/unit#MetermASDinstantaneous
2http://environment.data.gov.uk/flood-monitoring/id/measures/1029TH-level-stage-i-15_min-mASDhttp://environment.data.gov.uk/flood-monitoring/def/core/datumASDRIVER DIKLER AT BOURTON ON THE WATER - level-stage-i-15_min-mASD(var"@id" = "http://environment.data.gov.uk/flood-monitoring/data/readings/1029TH-level-stage-i-15_min-mASD/2025-03-10T00-00-00Z", date = Date("2025-03-10"), dateTime = DateTime("2025-03-10T00:00:00"), measure = "http://environment.data.gov.uk/flood-monitoring/id/measures/1029TH-level-stage-i-15_min-mASD", value = 0.27)1029TH-level-stage-i-15_min-mASDlevelWater Level900Stagehttp://environment.data.gov.uk/flood-monitoring/id/stations/1029TH1029THhttp://qudt.org/1.1/vocab/unit#MetermASDinstantaneous
3http://environment.data.gov.uk/flood-monitoring/id/measures/E2043-level-stage-i-15_min-mASDhttp://environment.data.gov.uk/flood-monitoring/def/core/datumASDSURFLEET SLUICE WITS - level-stage-i-15_min-mASD(var"@id" = "http://environment.data.gov.uk/flood-monitoring/data/readings/E2043-level-stage-i-15_min-mASD/2025-03-10T00-00-00Z", date = Date("2025-03-10"), dateTime = DateTime("2025-03-10T00:00:00"), measure = "http://environment.data.gov.uk/flood-monitoring/id/measures/E2043-level-stage-i-15_min-mASD", value = 0.488)E2043-level-stage-i-15_min-mASDlevelWater Level900Stagehttp://environment.data.gov.uk/flood-monitoring/id/stations/E2043E2043http://qudt.org/1.1/vocab/unit#MetermASDinstantaneous

This page was generated using Literate.jl.