Skip to content

Nesting

@nest¤

Nest columns into a dataframe nested into a new column

using TidierData

df4 = DataFrame(x = ["a", "b", "a", "b", "C", "a"], y = 1:6, yz = 13:18, a = 7:12, ab = 12:-1:7)

nested_df = @nest(df4, n2 = starts_with("a"), n3 = y:yz)
3×3 DataFrame
Rowxn3n2
StringDataFrameDataFrame
1a3×2 DataFrame3×2 DataFrame
2b2×2 DataFrame2×2 DataFrame
3C1×2 DataFrame1×2 DataFrame

To return to the original dataframe, you can unnest wider and then longer.

@chain nested_df begin
    @unnest_wider(n3:n2)
    @unnest_longer(y:ab)
end
6×5 DataFrame
Rowxyyzaab
StringInt64Int64Int64Int64
1a113712
2a315910
3a618127
4b214811
5b416109
6C517118

Or you can unnest longer and then wider.

@chain nested_df begin
  @unnest_longer(n3:n2)
  @unnest_wider(n3:n2)
end
6×5 DataFrame
Rowxyzyaab
StringInt64Int64Int64Int64
1a131712
2a153910
3a186127
4b142811
5b164109
6C175118

@unnest_longer¤

@unnest_longer adds one row per entry of an array or dataframe, lengthening dataframe by flattening the column or columns.

df = DataFrame(x = 1:4, y = [[], [1, 2, 3], [4, 5], Int[]]);

@chain df begin
    @unnest_longer(y)
end
5×2 DataFrame
Rowxy
Int64Any
121
222
323
434
535

If there are rows with empty arrays, keep_empty will prevent these rows from being dropped. include_indices will add a new column for each flattened column that logs the position of each entry in the array.

@chain df begin
    @unnest_longer(y, keep_empty = true, indices_include = true)
end
7×3 DataFrame
Rowxyy_id
Int64AnyInt64
11missing1
2211
3222
4233
5341
6352
74missing1

@unnest_wider¤

@unnest_wider will widen a column or column(s) of Dicts, Arrays, Tuples or Dataframes into multiple columns.

df2 = DataFrame(
           name = ["Zaki", "Farida"],
           attributes = [
               Dict("age" => 25, "city" => "New York"),
               Dict("age" => 30, "city" => "Los Angeles")]);

@chain df2 begin
    @unnest_wider(attributes)
end
2×3 DataFrame
Rownamecityage
StringStringInt64
1ZakiNew York25
2FaridaLos Angeles30

Unnesting nested Dataframes with different lengths which contains arrays¤

df3 = DataFrame(
    x = 1:3,
    y = Any[
        DataFrame(),
        DataFrame(a = ["A"], b = [14]),
        DataFrame(a = ["A", "B", "C"], b = [13, 12, 11], c = [4, 4, 4])
    ]
)
3×2 DataFrame
Rowxy
Int64Any
110×0 DataFrame
221×2 DataFrame
333×3 DataFrame

df3 contains dataframes in with different widths that also contain arrays. Chaining together @unnest_wider and @unnest_longer will unnest the columns to tuples first and then they will be fully unnested after.

@chain df3 begin
    @unnest_wider(y)
    @unnest_longer(a:c, keep_empty = true)
end
5×4 DataFrame
Rowxabc
Int64AnyInt64?Int64?
11missingmissingmissing
22A14missing
33A134
43B124
53C114

This page was generated using Literate.jl.