@nest
Nest columns into a dataframe nested into a new column
using TidierData
df4 = DataFrame(x = ["a", "b", "a", "b", "C", "a"], y = 1:6, yz = 13:18, a = 7:12, ab = 12:-1:7)
nested_df = @nest(df4, n2 = starts_with("a"), n3 = y:yz)
1 | a | 3×2 DataFrame | 3×2 DataFrame |
2 | b | 2×2 DataFrame | 2×2 DataFrame |
3 | C | 1×2 DataFrame | 1×2 DataFrame |
To return to the original dataframe, you can unnest wider and then longer.
@chain nested_df begin
@unnest_wider(n3:n2, names_sep = nothing)
@unnest_longer(y:ab)
end
1 | a | 1 | 13 | 7 | 12 |
2 | a | 3 | 15 | 9 | 10 |
3 | a | 6 | 18 | 12 | 7 |
4 | b | 2 | 14 | 8 | 11 |
5 | b | 4 | 16 | 10 | 9 |
6 | C | 5 | 17 | 11 | 8 |
Or you can unnest longer and then wider.
@chain nested_df begin
@unnest_longer(n3:n2)
@unnest_wider(n3:n2, names_sep = nothing)
end
1 | a | 13 | 1 | 7 | 12 |
2 | a | 15 | 3 | 9 | 10 |
3 | a | 18 | 6 | 12 | 7 |
4 | b | 14 | 2 | 8 | 11 |
5 | b | 16 | 4 | 10 | 9 |
6 | C | 17 | 5 | 11 | 8 |
@unnest_longer
@unnest_longer
adds one row per entry of an array or dataframe, lengthening dataframe by flattening the column or columns.
df = DataFrame(x = 1:4, y = [[], [1, 2, 3], [4, 5], Int[]]);
@chain df begin
@unnest_longer(y)
end
If there are rows with empty arrays, keep_empty
will prevent these rows from being dropped. include_indices
will add a new column for each flattened column that logs the position of each entry in the array.
@chain df begin
@unnest_longer(y, keep_empty = true, indices_include = true)
end
1 | 1 | missing | 1 |
2 | 2 | 1 | 1 |
3 | 2 | 2 | 2 |
4 | 2 | 3 | 3 |
5 | 3 | 4 | 1 |
6 | 3 | 5 | 2 |
7 | 4 | missing | 1 |
@unnest_wider
@unnest_wider
will widen a column or column(s) of Dicts, Arrays, Tuples or Dataframes into multiple columns.
df2 = DataFrame(
name = ["Zaki", "Farida"],
attributes = [
Dict("age" => 25, "city" => "New York"),
Dict("age" => 30, "city" => "Los Angeles")]);
@chain df2 begin
@unnest_wider(attributes)
end
1 | Zaki | New York | 25 |
2 | Farida | Los Angeles | 30 |
Unnesting nested Dataframes with different lengths which contains arrays
df3 = DataFrame(
x = 1:3,
y = Any[
DataFrame(),
DataFrame(a = ["A"], b = [14]),
DataFrame(a = ["A", "B", "C"], b = [13, 12, 11], c = [4, 4, 4])
]
)
1 | 1 | 0×0 DataFrame |
2 | 2 | 1×2 DataFrame |
3 | 3 | 3×3 DataFrame |
df3
contains dataframes in with different widths that also contain arrays. Chaining together @unnest_wider
and @unnest_longer
will unnest the columns to tuples first and then they will be fully unnested after.
@chain df3 begin
@unnest_wider(y)
@unnest_longer(y_a:y_c, keep_empty = true)
end
1 | 1 | missing | missing | missing |
2 | 2 | A | 14 | missing |
3 | 3 | A | 13 | 4 |
4 | 3 | B | 12 | 4 |
5 | 3 | C | 11 | 4 |
unnest JSON files
using JSON
json_str = """
{
"name": "Chris",
"age": 23,
"address": {
"city": "New York",
"country": "America"
},
"friends": [
{
"name": "Emily",
"hobbies": [ "biking", "music", "gaming" ]
},
{
"name": "John",
"hobbies": [ "soccer", "gaming" ]
}
]
}
""";
json_df = DataFrame(JSON.parse(json_str))
@chain json_df begin
@unnest_wider(address, friends)
end
1 | 23 | Chris | New York | America | Emily | Any["biking", "music", "gaming"] |
2 | 23 | Chris | New York | America | John | Any["soccer", "gaming"] |
This page was generated using Literate.jl.