Nesting
@nest
¤
Nest columns into a dataframe nested into a new column
using TidierData
df4 = DataFrame(x = ["a", "b", "a", "b", "C", "a"], y = 1:6, yz = 13:18, a = 7:12, ab = 12:-1:7)
nested_df = @nest(df4, n2 = starts_with("a"), n3 = y:yz)
Row | x | n3 | n2 |
---|---|---|---|
String | DataFrame | DataFrame | |
1 | a | 3×2 DataFrame | 3×2 DataFrame |
2 | b | 2×2 DataFrame | 2×2 DataFrame |
3 | C | 1×2 DataFrame | 1×2 DataFrame |
To return to the original dataframe, you can unnest wider and then longer.
@chain nested_df begin
@unnest_wider(n3:n2)
@unnest_longer(y:ab)
end
Row | x | y | yz | a | ab |
---|---|---|---|---|---|
String | Int64 | Int64 | Int64 | Int64 | |
1 | a | 1 | 13 | 7 | 12 |
2 | a | 3 | 15 | 9 | 10 |
3 | a | 6 | 18 | 12 | 7 |
4 | b | 2 | 14 | 8 | 11 |
5 | b | 4 | 16 | 10 | 9 |
6 | C | 5 | 17 | 11 | 8 |
Or you can unnest longer and then wider.
@chain nested_df begin
@unnest_longer(n3:n2)
@unnest_wider(n3:n2)
end
Row | x | yz | y | a | ab |
---|---|---|---|---|---|
String | Int64 | Int64 | Int64 | Int64 | |
1 | a | 13 | 1 | 7 | 12 |
2 | a | 15 | 3 | 9 | 10 |
3 | a | 18 | 6 | 12 | 7 |
4 | b | 14 | 2 | 8 | 11 |
5 | b | 16 | 4 | 10 | 9 |
6 | C | 17 | 5 | 11 | 8 |
@unnest_longer
¤
@unnest_longer
adds one row per entry of an array or dataframe, lengthening dataframe by flattening the column or columns.
df = DataFrame(x = 1:4, y = [[], [1, 2, 3], [4, 5], Int[]]);
@chain df begin
@unnest_longer(y)
end
Row | x | y |
---|---|---|
Int64 | Any | |
1 | 2 | 1 |
2 | 2 | 2 |
3 | 2 | 3 |
4 | 3 | 4 |
5 | 3 | 5 |
If there are rows with empty arrays, keep_empty
will prevent these rows from being dropped. include_indices
will add a new column for each flattened column that logs the position of each entry in the array.
@chain df begin
@unnest_longer(y, keep_empty = true, indices_include = true)
end
Row | x | y | y_id |
---|---|---|---|
Int64 | Any | Int64 | |
1 | 1 | missing | 1 |
2 | 2 | 1 | 1 |
3 | 2 | 2 | 2 |
4 | 2 | 3 | 3 |
5 | 3 | 4 | 1 |
6 | 3 | 5 | 2 |
7 | 4 | missing | 1 |
@unnest_wider
¤
@unnest_wider
will widen a column or column(s) of Dicts, Arrays, Tuples or Dataframes into multiple columns.
df2 = DataFrame(
name = ["Zaki", "Farida"],
attributes = [
Dict("age" => 25, "city" => "New York"),
Dict("age" => 30, "city" => "Los Angeles")]);
@chain df2 begin
@unnest_wider(attributes)
end
Row | name | city | age |
---|---|---|---|
String | String | Int64 | |
1 | Zaki | New York | 25 |
2 | Farida | Los Angeles | 30 |
Unnesting nested Dataframes with different lengths which contains arrays¤
df3 = DataFrame(
x = 1:3,
y = Any[
DataFrame(),
DataFrame(a = ["A"], b = [14]),
DataFrame(a = ["A", "B", "C"], b = [13, 12, 11], c = [4, 4, 4])
]
)
Row | x | y |
---|---|---|
Int64 | Any | |
1 | 1 | 0×0 DataFrame |
2 | 2 | 1×2 DataFrame |
3 | 3 | 3×3 DataFrame |
df3
contains dataframes in with different widths that also contain arrays. Chaining together @unnest_wider
and @unnest_longer
will unnest the columns to tuples first and then they will be fully unnested after.
@chain df3 begin
@unnest_wider(y)
@unnest_longer(a:c, keep_empty = true)
end
Row | x | a | b | c |
---|---|---|---|---|
Int64 | Any | Int64? | Int64? | |
1 | 1 | missing | missing | missing |
2 | 2 | A | 14 | missing |
3 | 3 | A | 13 | 4 |
4 | 3 | B | 12 | 4 |
5 | 3 | C | 11 | 4 |
This page was generated using Literate.jl.