Skip to content

@group_by

Grouping and ungrouping behavior is one of the nicest parts of using R's tidyverse. Once a data frame is grouped, all verbs applied to that data frame respect the grouping, including but not limited to @mutate(), @summarize(), @slice() and @filter, which allows for really powerful abstractions. For example, with @group_by() followed by @filter(), you can limit the rows of a dataset to the maximum or minimum values for each group.

Exactly as in R's tidyverse, once a data frame is grouped, it remains grouped until either @summarize() is called (which "peels off" one layer of grouping) or @ungroup() is called, which removes all layers of grouping. Also as in R's tidyverse, @group_by() sorts the groups in ascending order. Unlike in R, there is never any question about whether a data frame is currently grouped because GroupedDataFrames print out in a very different form than DataFrames, making them easy to tell apart.

When using @chain, note that you can write either @ungroup or @ungroup(). Both are considered valid.

using TidierData
using RDatasets

movies = dataset("ggplot2", "movies");

Combining @group_by() with @mutate()¤

@chain movies begin
    @group_by(Year)
    @mutate(Mean_Yearly_Rating = mean(skipmissing(Rating)))
    @select(Year, Rating, Mean_Yearly_Rating)
    @ungroup
    @slice(1:5)
end
5×3 DataFrame
RowYearRatingMean_Yearly_Rating
Int32Float64Float64
119716.45.66517
219396.06.35041
319418.26.34107
419968.25.74712
519753.45.62908

Combining @group_by() with @summarize()¤

@chain movies begin
    @group_by(Year)
    @summarize(Mean_Yearly_Rating = mean(skipmissing(Rating)),
        Median_Yearly_Rating = median(skipmissing(Rating)))
    @slice(1:5)
end
5×3 DataFrame
RowYearMean_Yearly_RatingMedian_Yearly_Rating
Int32Float64Float64
119715.665175.8
219396.350416.4
319416.341076.4
419965.747125.9
519755.629085.7

Grouping by multiple columns¤

@chain movies begin
  @group_by(Year, Comedy)
  @summarize(Mean_Yearly_Rating = mean(skipmissing(Rating)),
      Median_Yearly_Rating = median(skipmissing(Rating)))
  @ungroup # Need to ungroup to peel off grouping by Year
  @arrange(desc(Year), Comedy)
  @slice(1:5)
end
5×4 DataFrame
RowYearComedyMean_Yearly_RatingMedian_Yearly_Rating
Int32Int32Float64Float64
1200506.627886.75
2200516.300816.1
3200406.765216.9
4200416.428986.6
5200306.404096.6

Combining @group_by() with @filter()¤

@chain movies begin
    @group_by(Year)
    @filter(Rating == minimum(Rating))
    @ungroup
    @select(Year, Rating)
    @arrange(desc(Year))
    @slice(1:10)
end
10×2 DataFrame
RowYearRating
Int32Float64
120051.8
220041.0
320041.0
420041.0
520041.0
620041.0
720041.0
820041.0
920031.0
1020031.0

This page was generated using Literate.jl.