Separating
Follwing the tidyverse syntax, the @separate()
macro in TidierData.jl
separates a single column into multiple columns. This is particularly useful for splitting a column containing delimited values into individual columns.
using TidierData
df = DataFrame(a = ["1-1", "2-2", "3-3-3"]);
@separate
¤
Separate the "a" column into "b", "c", and "d" columns based on the dash delimiter
@chain df begin
@separate(a, (b, c, d), "-")
end
3×3 DataFrame
Row | b | c | d |
---|---|---|---|
SubStrin… | SubStrin… | SubStrin…? | |
1 | 1 | 1 | missing |
2 | 2 | 2 | missing |
3 | 3 | 3 | 3 |
The into
columns can also be designated as follows:
new_names = ["x$(i)" for i in 1:3]; # or new_names = ["b", "c", "d"], or new_names = [:b, :c, :d]
@separate(df, a, !!new_names, "-")
3×3 DataFrame
Row | x1 | x2 | x3 |
---|---|---|---|
SubStrin… | SubStrin… | SubStrin…? | |
1 | 1 | 1 | missing |
2 | 2 | 2 | missing |
3 | 3 | 3 | 3 |
@unite
¤
The @unite
macro brings together multiple columns into one, separate the characters by a user specified delimiter Here, the @unite
macro combines the "b", "c", and "d" columns columns into a single new "new_col" column using the "/" delimiter
df = DataFrame(
b = ["1", "2", "3"],
c = ["1", "2", "3"],
d = [missing, missing, "3"]);
@chain df begin
@unite(new_col, (b, c, d), "/")
end
3×1 DataFrame
Row | new_col |
---|---|
String | |
1 | 1/1 |
2 | 2/2 |
3 | 3/3/3 |
@separate_rows
¤
Separate rows into multiple rows based on a chosen delimiter.
df = DataFrame(
a = 1:3,
b = ["a", "aa;bb;cc", "dd;ee"],
c = ["1", "2;3;4", "5;6"],
d = ["7", "8;9;10", "11;12"],
e = ["11", "22;33;44", "55;66"]);
@separate_rows(df, b:e, ";")
6×5 DataFrame
Row | a | b | c | d | e |
---|---|---|---|---|---|
Int64 | SubStrin… | SubStrin… | SubStrin… | SubStrin… | |
1 | 1 | a | 1 | 7 | 11 |
2 | 2 | aa | 2 | 8 | 22 |
3 | 2 | bb | 3 | 9 | 33 |
4 | 2 | cc | 4 | 10 | 44 |
5 | 3 | dd | 5 | 11 | 55 |
6 | 3 | ee | 6 | 12 | 66 |
This page was generated using Literate.jl.