@group_by
Grouping and ungrouping behavior is one of the nicest parts of using R's tidyverse. Once a data frame is grouped, all verbs applied to that data frame respect the grouping, including but not limited to @mutate()
, @summarize()
, @slice()
and @filter
, which allows for really powerful abstractions. For example, with @group_by()
followed by @filter()
, you can limit the rows of a dataset to the maximum or minimum values for each group.
Exactly as in R's tidyverse
, once a data frame is grouped, it remains grouped until either @summarize()
is called (which "peels off" one layer of grouping) or @ungroup()
is called, which removes all layers of grouping. Also as in R's tidyverse
, @group_by()
sorts the groups in ascending order. Unlike in R, there is never any question about whether a data frame is currently grouped because GroupedDataFrames print out in a very different form than DataFrames, making them easy to tell apart.
When using @chain
, note that you can write either @ungroup
or @ungroup()
. Both are considered valid.
using TidierData
using RDatasets
movies = dataset("ggplot2", "movies");
Combining @group_by()
with @mutate()
¤
@chain movies begin
@group_by(Year)
@mutate(Mean_Yearly_Rating = mean(skipmissing(Rating)))
@select(Year, Rating, Mean_Yearly_Rating)
@ungroup
@slice(1:5)
end
Row | Year | Rating | Mean_Yearly_Rating |
---|---|---|---|
Int32 | Float64 | Float64 | |
1 | 1971 | 6.4 | 5.66517 |
2 | 1939 | 6.0 | 6.35041 |
3 | 1941 | 8.2 | 6.34107 |
4 | 1996 | 8.2 | 5.74712 |
5 | 1975 | 3.4 | 5.62908 |
Combining @group_by() with @summarize()¤
@chain movies begin
@group_by(Year)
@summarize(Mean_Yearly_Rating = mean(skipmissing(Rating)),
Median_Yearly_Rating = median(skipmissing(Rating)))
@slice(1:5)
end
Row | Year | Mean_Yearly_Rating | Median_Yearly_Rating |
---|---|---|---|
Int32 | Float64 | Float64 | |
1 | 1971 | 5.66517 | 5.8 |
2 | 1939 | 6.35041 | 6.4 |
3 | 1941 | 6.34107 | 6.4 |
4 | 1996 | 5.74712 | 5.9 |
5 | 1975 | 5.62908 | 5.7 |
Grouping by multiple columns¤
@chain movies begin
@group_by(Year, Comedy)
@summarize(Mean_Yearly_Rating = mean(skipmissing(Rating)),
Median_Yearly_Rating = median(skipmissing(Rating)))
@ungroup # Need to ungroup to peel off grouping by Year
@arrange(desc(Year), Comedy)
@slice(1:5)
end
Row | Year | Comedy | Mean_Yearly_Rating | Median_Yearly_Rating |
---|---|---|---|---|
Int32 | Int32 | Float64 | Float64 | |
1 | 2005 | 0 | 6.62788 | 6.75 |
2 | 2005 | 1 | 6.30081 | 6.1 |
3 | 2004 | 0 | 6.76521 | 6.9 |
4 | 2004 | 1 | 6.42898 | 6.6 |
5 | 2003 | 0 | 6.40409 | 6.6 |
Combining @group_by() with @filter()¤
@chain movies begin
@group_by(Year)
@filter(Rating == minimum(Rating))
@ungroup
@select(Year, Rating)
@arrange(desc(Year))
@slice(1:10)
end
Row | Year | Rating |
---|---|---|
Int32 | Float64 | |
1 | 2005 | 1.8 |
2 | 2004 | 1.0 |
3 | 2004 | 1.0 |
4 | 2004 | 1.0 |
5 | 2004 | 1.0 |
6 | 2004 | 1.0 |
7 | 2004 | 1.0 |
8 | 2004 | 1.0 |
9 | 2003 | 1.0 |
10 | 2003 | 1.0 |
This page was generated using Literate.jl.