@group_by

Grouping and ungrouping behavior is one of the nicest parts of using R's tidyverse. Once a data frame is grouped, all verbs applied to that data frame respect the grouping, including but not limited to @mutate(), @summarize(), @slice() and @filter, which allows for really powerful abstractions. For example, with @group_by() followed by @filter(), you can limit the rows of a dataset to the maximum or minimum values for each group.

Exactly as in R's tidyverse, once a data frame is grouped, it remains grouped until either @summarize() is called (which "peels off" one layer of grouping) or @ungroup() is called, which removes all layers of grouping. Also as in R's tidyverse, @group_by() sorts the groups in ascending order. Unlike in R, there is never any question about whether a data frame is currently grouped because GroupedDataFrames print out in a very different form than DataFrames, making them easy to tell apart.

When using @chain, note that you can write either @ungroup or @ungroup(). Both are considered valid.

using TidierData
using RDatasets

movies = dataset("ggplot2", "movies");

Combining `@group_by()` with `@mutate()`¤

@chain movies begin
    @group_by(Year)
    @mutate(Mean_Yearly_Rating = mean(skipmissing(Rating)))
    @select(Year, Rating, Mean_Yearly_Rating)
    @ungroup
    @slice(1:5)
end

5×3 DataFrame

Row	Year	Rating	Mean_Yearly_Rating
	Int32	Float64	Float64
1	1971	6.4	5.66517
2	1939	6.0	6.35041
3	1941	8.2	6.34107
4	1996	8.2	5.74712
5	1975	3.4	5.62908

Combining @group_by() with @summarize()¤

@chain movies begin
    @group_by(Year)
    @summarize(Mean_Yearly_Rating = mean(skipmissing(Rating)),
        Median_Yearly_Rating = median(skipmissing(Rating)))
    @slice(1:5)
end

5×3 DataFrame

Row	Year	Mean_Yearly_Rating	Median_Yearly_Rating
	Int32	Float64	Float64
1	1971	5.66517	5.8
2	1939	6.35041	6.4
3	1941	6.34107	6.4
4	1996	5.74712	5.9
5	1975	5.62908	5.7

Grouping by multiple columns¤

@chain movies begin
  @group_by(Year, Comedy)
  @summarize(Mean_Yearly_Rating = mean(skipmissing(Rating)),
      Median_Yearly_Rating = median(skipmissing(Rating)))
  @ungroup # Need to ungroup to peel off grouping by Year
  @arrange(desc(Year), Comedy)
  @slice(1:5)
end

5×4 DataFrame

Row	Year	Comedy	Mean_Yearly_Rating	Median_Yearly_Rating
	Int32	Int32	Float64	Float64
1	2005	0	6.62788	6.75
2	2005	1	6.30081	6.1
3	2004	0	6.76521	6.9
4	2004	1	6.42898	6.6
5	2003	0	6.40409	6.6

Combining @group_by() with @filter()¤

@chain movies begin
    @group_by(Year)
    @filter(Rating == minimum(Rating))
    @ungroup
    @select(Year, Rating)
    @arrange(desc(Year))
    @slice(1:10)
end

10×2 DataFrame

Row	Year	Rating
	Int32	Float64
1	2005	1.8
2	2004	1.0
3	2004	1.0
4	2004	1.0
5	2004	1.0
6	2004	1.0
7	2004	1.0
8	2004	1.0
9	2003	1.0
10	2003	1.0

This page was generated using Literate.jl.

@group_by

Combining @group_by() with @mutate()¤

Combining @group_by() with @summarize()¤

Grouping by multiple columns¤

Combining @group_by() with @filter()¤

Combining `@group_by()` with `@mutate()`¤