Skip to content

@distinct

The @distinct() macro in TidierData.jl is useful to select distinct rows. Like it's R counterpart, it can be used with or without arguments. When arguments are provided, it behaves slightly differently than the R version. Whereas the R function only returns the provided columns, the TidierData.jl version returns all columns, where the first match is returned for the non-selected columns.

using TidierData

df = DataFrame(a = 1:10, b = repeat('a':'e', inner = 2))
10×2 DataFrame
Rowab
Int64Char
11a
22a
33b
44b
55c
66c
77d
88d
99e
1010e

Select distinct values overall¤

Since there are no duplicate rows, this will return all rows.

@chain df begin
    @distinct()
end
10×2 DataFrame
Rowab
Int64Char
11a
22a
33b
44b
55c
66c
77d
88d
99e
1010e

Select distinct values based on column b¤

Notice that the first matching row for column a is returned for every distinct value of column b. This is slightly different behavior than R's tidyverse, which would have returned only column b.

@chain df begin
  @distinct(b)
end
5×2 DataFrame
Rowab
Int64Char
11a
23b
35c
47d
59e

In TidierData.jl, @distinct() works with grouped data frames. If grouped, @distinct() will ignore the grouping when determining distinct values but will return the data frame in grouped form based on the original groupings.


This page was generated using Literate.jl.