Binding
Whereas joins are useful for combining data frames based on matching keys, another way to combine data frames is to bind them together, which can be done either by rows or by columns. TidierData.jl implements these actions using @bind_rows() and @bind_cols(), respectively.
Let's generate three data frames to combine.
using TidierData
df1 = DataFrame(a=1:3, b=1:3);
df2 = DataFrame(a=4:6, b=4:6);
df3 = DataFrame(a=7:9, c=7:9);
@bind_rows()¤
@bind_rows(df1, df2)
| Row | a | b |
|---|---|---|
| Int64 | Int64 | |
| 1 | 1 | 1 |
| 2 | 2 | 2 |
| 3 | 3 | 3 |
| 4 | 4 | 4 |
| 5 | 5 | 5 |
| 6 | 6 | 6 |
@bind_rows() keeps columns that are present in at least one of the provided data frames. Any missing columns will be filled with missing values.
@bind_rows(df1, df3)
| Row | a | b | c |
|---|---|---|---|
| Int64 | Int64? | Int64? | |
| 1 | 1 | 1 | missing |
| 2 | 2 | 2 | missing |
| 3 | 3 | 3 | missing |
| 4 | 7 | missing | 7 |
| 5 | 8 | missing | 8 |
| 6 | 9 | missing | 9 |
There is an optional id argument to add an identifier for combined data frames. Note that both @bind_rows and @bind_cols accept multiple (i.e., more than 2) data frames, as in the example below.
@bind_rows(df1, df2, df3, id = "id")
| Row | a | b | c | id |
|---|---|---|---|---|
| Int64 | Int64? | Int64? | Int64 | |
| 1 | 1 | 1 | missing | 1 |
| 2 | 2 | 2 | missing | 1 |
| 3 | 3 | 3 | missing | 1 |
| 4 | 4 | 4 | missing | 2 |
| 5 | 5 | 5 | missing | 2 |
| 6 | 6 | 6 | missing | 2 |
| 7 | 7 | missing | 7 | 3 |
| 8 | 8 | missing | 8 | 3 |
| 9 | 9 | missing | 9 | 3 |
@bind_cols()¤
@bind_cols works similarly to R's tidyverse although the .name_repair argument is not supported.
@bind_cols(df1, df2)
| Row | a | b | a_1 | b_1 |
|---|---|---|---|---|
| Int64 | Int64 | Int64 | Int64 | |
| 1 | 1 | 1 | 4 | 4 |
| 2 | 2 | 2 | 5 | 5 |
| 3 | 3 | 3 | 6 | 6 |
This page was generated using Literate.jl.