@distinct

The @distinct() macro in TidierData.jl is useful to select distinct rows. Like it's R counterpart, it can be used with or without arguments. When arguments are provided, it behaves slightly differently than the R version. Whereas the R function only returns the provided columns, the TidierData.jl version returns all columns, where the first match is returned for the non-selected columns.

using TidierData

df = DataFrame(a = 1:10, b = repeat('a':'e', inner = 2))

10×2 DataFrame

Row	a	b
	Int64	Char
1	1	a
2	2	a
3	3	b
4	4	b
5	5	c
6	6	c
7	7	d
8	8	d
9	9	e
10	10	e

Select distinct values overall¤

Since there are no duplicate rows, this will return all rows.

@chain df begin
    @distinct()
end

10×2 DataFrame

Row	a	b
	Int64	Char
1	1	a
2	2	a
3	3	b
4	4	b
5	5	c
6	6	c
7	7	d
8	8	d
9	9	e
10	10	e

Select distinct values based on column `b`¤

Notice that the first matching row for column a is returned for every distinct value of column b. This is slightly different behavior than R's tidyverse, which would have returned only column b.

@chain df begin
  @distinct(b)
end

5×2 DataFrame

Row	a	b
	Int64	Char
1	1	a
2	3	b
3	5	c
4	7	d
5	9	e

In TidierData.jl, @distinct() works with grouped data frames. If grouped, @distinct() will ignore the grouping when determining distinct values but will return the data frame in grouped form based on the original groupings.

This page was generated using Literate.jl.

@distinct

Select distinct values overall¤

Select distinct values based on column b¤

Select distinct values based on column `b`¤