Skip to content

@mutate

The primary purpose of @mutate() is to either create a new column or to update an existing column without changing the number of rows in the dataset. If you only plan to select the mutated columns, then you can use @transmute() instead of @mutate(). However, inTidier.jl,@select()can also be used to create and select new columns (unlike R'stidyverse), which means that@transmute()is a redundant function in that it has the same functionality as@select().@transmuteis included inTidier.jl` for convenience but is not strictly required.

using Tidier
using RDatasets

movies = dataset("ggplot2", "movies");

Using @mutate() to add a new column¤

Let's create a new column that contains the budget for each movie expressed in millions of dollars, and the select a handful of columns and rows for the sake of brevity. Notice that the underscores in in 1_000_000 are strictly optional and included only for the sake of readability. Underscores within numbers are ignored by Julia, such that 1_000_000 is read by Julia exactly the same as 1000000.

@chain movies begin
  @filter(!ismissing(Budget))
  @mutate(Budget_Millions = Budget/1_000_000)
  @select(Title, Budget, Budget_Millions)
  @slice(1:5)
end
5×3 DataFrame
RowTitleBudgetBudget_Millions
StringInt32?Float64
1'G' Men4500000.45
2'Manos' the Hands of Fate190000.019
3'Til There Was You2300000023.0
4.com for Murder50000005.0
510 Things I Hate About You1600000016.0

Using @mutate() to update an existing column¤

Here we will repeat the same exercise, except that we will overwrite the existing Budget column.

@chain movies begin
    @filter(!ismissing(Budget))
    @mutate(Budget = Budget/1_000_000)
    @select(Title, Budget)
    @slice(1:5)
end
5×2 DataFrame
RowTitleBudget
StringFloat64
1'G' Men0.45
2'Manos' the Hands of Fate0.019
3'Til There Was You23.0
4.com for Murder5.0
510 Things I Hate About You16.0

Here's an example of using @mutate with in.

@chain movies begin
  @filter(!ismissing(Budget))
  @mutate(Nineties = Year in 1990:1999)
  @select(Title, Year, Nineties)
  @slice(1:5)
end
5×3 DataFrame
RowTitleYearNineties
StringInt32Bool
1'G' Men1935false
2'Manos' the Hands of Fate1966false
3'Til There Was You1997true
4.com for Murder2002false
510 Things I Hate About You1999true

Using @transmute to update and select columns.¤

If we knew we wanted to select only the Title and Budget columns, we could have also used@transmute(), which (again) is just an alias for @select().

@chain movies begin
    @filter(!ismissing(Budget))
    @transmute(Title = Title, Budget = Budget/1_000_000)
    @slice(1:5)
end
5×2 DataFrame
RowTitleBudget
StringFloat64
1'G' Men0.45
2'Manos' the Hands of Fate0.019
3'Til There Was You23.0
4.com for Murder5.0
510 Things I Hate About You16.0

This page was generated using Literate.jl.