@mutate
The primary purpose of @mutate() is to either create a new column or to update an existing column without changing the number of rows in the dataset. If you only plan to select the mutated columns, then you can use @transmute() instead of @mutate(). However, in TidierData.jl, @select() can also be used to create and select new columns (unlike R's tidyverse), which means that @transmute() is a redundant function in that it has the same functionality as @select(). @transmute is included in TidierData.jl for convenience but is not strictly required.
using TidierData
using RDatasets
movies = dataset("ggplot2", "movies");
Using @mutate() to add a new column¤
Let's create a new column that contains the budget for each movie expressed in millions of dollars, and the select a handful of columns and rows for the sake of brevity. Notice that the underscores in in 1_000_000 are strictly optional and included only for the sake of readability. Underscores within numbers are ignored by Julia, such that 1_000_000 is read by Julia exactly the same as 1000000.
@chain movies begin
@filter(!ismissing(Budget))
@mutate(Budget_Millions = Budget/1_000_000)
@select(Title, Budget, Budget_Millions)
@slice(1:5)
end
| Row | Title | Budget | Budget_Millions |
|---|---|---|---|
| String | Int32? | Float64 | |
| 1 | 'G' Men | 450000 | 0.45 |
| 2 | 'Manos' the Hands of Fate | 19000 | 0.019 |
| 3 | 'Til There Was You | 23000000 | 23.0 |
| 4 | .com for Murder | 5000000 | 5.0 |
| 5 | 10 Things I Hate About You | 16000000 | 16.0 |
Using @mutate() to update an existing column¤
Here we will repeat the same exercise, except that we will overwrite the existing Budget column.
@chain movies begin
@filter(!ismissing(Budget))
@mutate(Budget = Budget/1_000_000)
@select(Title, Budget)
@slice(1:5)
end
| Row | Title | Budget |
|---|---|---|
| String | Float64 | |
| 1 | 'G' Men | 0.45 |
| 2 | 'Manos' the Hands of Fate | 0.019 |
| 3 | 'Til There Was You | 23.0 |
| 4 | .com for Murder | 5.0 |
| 5 | 10 Things I Hate About You | 16.0 |
Using @mutate() with in¤
Here's an example of using @mutate with in.
@chain movies begin
@filter(!ismissing(Budget))
@mutate(Nineties = Year in 1990:1999)
@select(Title, Year, Nineties)
@slice(1:5)
end
| Row | Title | Year | Nineties |
|---|---|---|---|
| String | Int32 | Bool | |
| 1 | 'G' Men | 1935 | false |
| 2 | 'Manos' the Hands of Fate | 1966 | false |
| 3 | 'Til There Was You | 1997 | true |
| 4 | .com for Murder | 2002 | false |
| 5 | 10 Things I Hate About You | 1999 | true |
Using @mutate with n() and row_number()¤
Here's an example of using @mutate with both n() and row_number(). Within the context of mutate(), n() and row_number() are created into temporarily columns, which means that they can be used inside of expressions.
@chain movies begin
@mutate(Row_Num = row_number(),
Total_Rows = n())
@filter(!ismissing(Budget))
@select(Title, Year, Row_Num, Total_Rows)
@slice(1:5)
end
| Row | Title | Year | Row_Num | Total_Rows |
|---|---|---|---|---|
| String | Int32 | Int64 | Int64 | |
| 1 | 'G' Men | 1935 | 22 | 58788 |
| 2 | 'Manos' the Hands of Fate | 1966 | 35 | 58788 |
| 3 | 'Til There Was You | 1997 | 48 | 58788 |
| 4 | .com for Murder | 2002 | 91 | 58788 |
| 5 | 10 Things I Hate About You | 1999 | 112 | 58788 |
Using @transmute to update and select columns.¤
If we knew we wanted to select only the Title and Budget columns, we could have also used@transmute(), which (again) is just an alias for @select().
@chain movies begin
@filter(!ismissing(Budget))
@transmute(Title = Title, Budget = Budget/1_000_000)
@slice(1:5)
end
| Row | Title | Budget |
|---|---|---|
| String | Float64 | |
| 1 | 'G' Men | 0.45 |
| 2 | 'Manos' the Hands of Fate | 0.019 |
| 3 | 'Til There Was You | 23.0 |
| 4 | .com for Murder | 5.0 |
| 5 | 10 Things I Hate About You | 16.0 |
This page was generated using Literate.jl.