@group_by
Grouping and ungrouping behavior is one of the nicest parts of using R's tidyverse. Once a data frame is grouped, all verbs applied to that data frame respect the grouping, including but not limited to @mutate(), @summarize(), @slice() and @filter, which allows for really powerful abstractions. For example, with @group_by() followed by @filter(), you can limit the rows of a dataset to the maximum or minimum values for each group.
Exactly as in R's tidyverse, once a data frame is grouped, it remains grouped until either @summarize() is called (which "peels off" one layer of grouping) or @ungroup() is called, which removes all layers of grouping. Also as in R's tidyverse, @group_by() sorts the groups in ascending order. Unlike in R, there is never any question about whether a data frame is currently grouped because GroupedDataFrames print out in a very different form than DataFrames, making them easy to tell apart.
When using @chain, note that you can write either @ungroup or @ungroup(). Both are considered valid.
using TidierData
using RDatasets
movies = dataset("ggplot2", "movies");
Combining @group_by() with @mutate()¤
@chain movies begin
@group_by(Year)
@mutate(Mean_Yearly_Rating = mean(skipmissing(Rating)))
@select(Year, Rating, Mean_Yearly_Rating)
@ungroup
@slice(1:5)
end
| Row | Year | Rating | Mean_Yearly_Rating |
|---|---|---|---|
| Int32 | Float64 | Float64 | |
| 1 | 1971 | 6.4 | 5.66517 |
| 2 | 1939 | 6.0 | 6.35041 |
| 3 | 1941 | 8.2 | 6.34107 |
| 4 | 1996 | 8.2 | 5.74712 |
| 5 | 1975 | 3.4 | 5.62908 |
Combining @group_by() with @summarize()¤
@chain movies begin
@group_by(Year)
@summarize(Mean_Yearly_Rating = mean(skipmissing(Rating)),
Median_Yearly_Rating = median(skipmissing(Rating)))
@slice(1:5)
end
| Row | Year | Mean_Yearly_Rating | Median_Yearly_Rating |
|---|---|---|---|
| Int32 | Float64 | Float64 | |
| 1 | 1971 | 5.66517 | 5.8 |
| 2 | 1939 | 6.35041 | 6.4 |
| 3 | 1941 | 6.34107 | 6.4 |
| 4 | 1996 | 5.74712 | 5.9 |
| 5 | 1975 | 5.62908 | 5.7 |
Grouping by multiple columns¤
@chain movies begin
@group_by(Year, Comedy)
@summarize(Mean_Yearly_Rating = mean(skipmissing(Rating)),
Median_Yearly_Rating = median(skipmissing(Rating)))
@ungroup # Need to ungroup to peel off grouping by Year
@arrange(desc(Year), Comedy)
@slice(1:5)
end
| Row | Year | Comedy | Mean_Yearly_Rating | Median_Yearly_Rating |
|---|---|---|---|---|
| Int32 | Int32 | Float64 | Float64 | |
| 1 | 2005 | 0 | 6.62788 | 6.75 |
| 2 | 2005 | 1 | 6.30081 | 6.1 |
| 3 | 2004 | 0 | 6.76521 | 6.9 |
| 4 | 2004 | 1 | 6.42898 | 6.6 |
| 5 | 2003 | 0 | 6.40409 | 6.6 |
Combining @group_by() with @filter()¤
@chain movies begin
@group_by(Year)
@filter(Rating == minimum(Rating))
@ungroup
@select(Year, Rating)
@arrange(desc(Year))
@slice(1:10)
end
| Row | Year | Rating |
|---|---|---|
| Int32 | Float64 | |
| 1 | 2005 | 1.8 |
| 2 | 2004 | 1.0 |
| 3 | 2004 | 1.0 |
| 4 | 2004 | 1.0 |
| 5 | 2004 | 1.0 |
| 6 | 2004 | 1.0 |
| 7 | 2004 | 1.0 |
| 8 | 2004 | 1.0 |
| 9 | 2003 | 1.0 |
| 10 | 2003 | 1.0 |
This page was generated using Literate.jl.