@distinct
The @distinct()
macro in TidierData.jl
is useful to select distinct rows. Like it's R counterpart, it can be used with or without arguments. When arguments are provided, it behaves slightly differently than the R version. Whereas the R function only returns the provided columns, the TidierData.jl version returns all columns, where the first match is returned for the non-selected columns.
using TidierData
df = DataFrame(a = 1:10, b = repeat('a':'e', inner = 2))
Row | a | b |
---|---|---|
Int64 | Char | |
1 | 1 | a |
2 | 2 | a |
3 | 3 | b |
4 | 4 | b |
5 | 5 | c |
6 | 6 | c |
7 | 7 | d |
8 | 8 | d |
9 | 9 | e |
10 | 10 | e |
Select distinct values overall¤
Since there are no duplicate rows, this will return all rows.
@chain df begin
@distinct()
end
Row | a | b |
---|---|---|
Int64 | Char | |
1 | 1 | a |
2 | 2 | a |
3 | 3 | b |
4 | 4 | b |
5 | 5 | c |
6 | 6 | c |
7 | 7 | d |
8 | 8 | d |
9 | 9 | e |
10 | 10 | e |
Select distinct values based on column b
¤
Notice that the first matching row for column a
is returned for every distinct value of column b
. This is slightly different behavior than R's tidyverse, which would have returned only column b
.
@chain df begin
@distinct(b)
end
Row | a | b |
---|---|---|
Int64 | Char | |
1 | 1 | a |
2 | 3 | b |
3 | 5 | c |
4 | 7 | d |
5 | 9 | e |
In TidierData.jl, @distinct()
works with grouped data frames. If grouped, @distinct()
will ignore the grouping when determining distinct values but will return the data frame in grouped form based on the original groupings.
This page was generated using Literate.jl.