Skip to content

Nesting

@nest¤

Nest columns into a dataframe nested into a new column

using TidierData

df4 = DataFrame(x = ["a", "b", "a", "b", "C", "a"], y = 1:6, yz = 13:18, a = 7:12, ab = 12:-1:7)

nested_df = @nest(df4, n2 = starts_with("a"), n3 = y:yz)
3×3 DataFrame
Rowxn3n2
StringDataFrameDataFrame
1a3×2 DataFrame3×2 DataFrame
2b2×2 DataFrame2×2 DataFrame
3C1×2 DataFrame1×2 DataFrame

To return to the original dataframe, you can unnest wider and then longer.

@chain nested_df begin
    @unnest_wider(n3:n2, names_sep = nothing)
    @unnest_longer(y:ab)
end
6×5 DataFrame
Rowxyyzaab
StringInt64Int64Int64Int64
1a113712
2a315910
3a618127
4b214811
5b416109
6C517118

Or you can unnest longer and then wider.

@chain nested_df begin
  @unnest_longer(n3:n2)
  @unnest_wider(n3:n2, names_sep = nothing)
end
6×5 DataFrame
Rowxyzyaab
StringInt64Int64Int64Int64
1a131712
2a153910
3a186127
4b142811
5b164109
6C175118

@unnest_longer¤

@unnest_longer adds one row per entry of an array or dataframe, lengthening dataframe by flattening the column or columns.

df = DataFrame(x = 1:4, y = [[], [1, 2, 3], [4, 5], Int[]]);

@chain df begin
    @unnest_longer(y)
end
5×2 DataFrame
Rowxy
Int64Any
121
222
323
434
535

If there are rows with empty arrays, keep_empty will prevent these rows from being dropped. include_indices will add a new column for each flattened column that logs the position of each entry in the array.

@chain df begin
    @unnest_longer(y, keep_empty = true, indices_include = true)
end
7×3 DataFrame
Rowxyy_id
Int64AnyInt64
11missing1
2211
3222
4233
5341
6352
74missing1

@unnest_wider¤

@unnest_wider will widen a column or column(s) of Dicts, Arrays, Tuples or Dataframes into multiple columns.

df2 = DataFrame(
           name = ["Zaki", "Farida"],
           attributes = [
               Dict("age" => 25, "city" => "New York"),
               Dict("age" => 30, "city" => "Los Angeles")]);

@chain df2 begin
    @unnest_wider(attributes)
end
2×3 DataFrame
Rownameattributes_cityattributes_age
StringStringInt64
1ZakiNew York25
2FaridaLos Angeles30

Unnesting nested Dataframes with different lengths which contains arrays¤

df3 = DataFrame(
    x = 1:3,
    y = Any[
        DataFrame(),
        DataFrame(a = ["A"], b = [14]),
        DataFrame(a = ["A", "B", "C"], b = [13, 12, 11], c = [4, 4, 4])
    ]
)
3×2 DataFrame
Rowxy
Int64Any
110×0 DataFrame
221×2 DataFrame
333×3 DataFrame

df3 contains dataframes in with different widths that also contain arrays. Chaining together @unnest_wider and @unnest_longer will unnest the columns to tuples first and then they will be fully unnested after.

@chain df3 begin
    @unnest_wider(y)
    @unnest_longer(y_a:y_c, keep_empty = true)
end
5×4 DataFrame
Rowxy_ay_by_c
Int64AnyInt64?Int64?
11missingmissingmissing
22A14missing
33A134
43B124
53C114

unnest JSON files¤

using JSON

json_str = """
       {
           "name": "Chris",
           "age": 23,
           "address": {
               "city": "New York",
               "country": "America"
           },
           "friends": [
               {
                   "name": "Emily",
                   "hobbies": [ "biking", "music", "gaming" ]
               },
               {
                   "name": "John",
                   "hobbies": [ "soccer", "gaming" ]
               }
           ]
       }
       """;
json_df = DataFrame(JSON.parse(json_str))

@chain json_df begin
       @unnest_wider(address, friends)
end
2×6 DataFrame
Rowagenameaddress_cityaddress_countryfriends_namefriends_hobbies
Int64StringStringStringStringArray…
123ChrisNew YorkAmericaEmilyAny["biking", "music", "gaming"]
223ChrisNew YorkAmericaJohnAny["soccer", "gaming"]

This page was generated using Literate.jl.