@distinct
The @distinct() macro in TidierData.jl is useful to select distinct rows. Like it's R counterpart, it can be used with or without arguments. When arguments are provided, it behaves slightly differently than the R version. Whereas the R function only returns the provided columns, the TidierData.jl version returns all columns, where the first match is returned for the non-selected columns.
using TidierData
df = DataFrame(a = 1:10, b = repeat('a':'e', inner = 2))
| Row | a | b |
|---|---|---|
| Int64 | Char | |
| 1 | 1 | a |
| 2 | 2 | a |
| 3 | 3 | b |
| 4 | 4 | b |
| 5 | 5 | c |
| 6 | 6 | c |
| 7 | 7 | d |
| 8 | 8 | d |
| 9 | 9 | e |
| 10 | 10 | e |
Select distinct values overall¤
Since there are no duplicate rows, this will return all rows.
@chain df begin
@distinct()
end
| Row | a | b |
|---|---|---|
| Int64 | Char | |
| 1 | 1 | a |
| 2 | 2 | a |
| 3 | 3 | b |
| 4 | 4 | b |
| 5 | 5 | c |
| 6 | 6 | c |
| 7 | 7 | d |
| 8 | 8 | d |
| 9 | 9 | e |
| 10 | 10 | e |
Select distinct values based on column b¤
Notice that the first matching row for column a is returned for every distinct value of column b. This is slightly different behavior than R's tidyverse, which would have returned only column b.
@chain df begin
@distinct(b)
end
| Row | a | b |
|---|---|---|
| Int64 | Char | |
| 1 | 1 | a |
| 2 | 3 | b |
| 3 | 5 | c |
| 4 | 7 | d |
| 5 | 9 | e |
In TidierData.jl, @distinct() works with grouped data frames. If grouped, @distinct() will ignore the grouping when determining distinct values but will return the data frame in grouped form based on the original groupings.
This page was generated using Literate.jl.