Movies dataset
To get started, we will load the movies
dataset from the RDatasets.jl
package.
using TidierData
using RDatasets
movies = dataset("ggplot2", "movies");
To work with this dataset, we will use the @chain
macro. This macro initiates a pipe, and every function or macro provided to it between the begin
and end
blocks modifies the dataframe mentioned at the beginning of the pipe. You don't have to necessarily spread a chain over multiple lines of code, but when working with data frames it's often easiest to do so. Before going further, take a look at the Chain.jl GitHub page to see all the cool things that are possible with this, including mid-chain side effects using @aside
and mid-chain assignment of variables.
Let's take a look at the first 5 rows of the movies
dataset using @slice()
.
@chain movies begin
@slice(1:5)
end
Row | Title | Year | Length | Budget | Rating | Votes | R1 | R2 | R3 | R4 | R5 | R6 | R7 | R8 | R9 | R10 | MPAA | Action | Animation | Comedy | Drama | Documentary | Romance | Short |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
String | Int32 | Int32 | Int32? | Float64 | Int32 | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | Cat… | Int32 | Int32 | Int32 | Int32 | Int32 | Int32 | Int32 | |
1 | $ | 1971 | 121 | missing | 6.4 | 348 | 4.5 | 4.5 | 4.5 | 4.5 | 14.5 | 24.5 | 24.5 | 14.5 | 4.5 | 4.5 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | |
2 | $1000 a Touchdown | 1939 | 71 | missing | 6.0 | 20 | 0.0 | 14.5 | 4.5 | 24.5 | 14.5 | 14.5 | 14.5 | 4.5 | 4.5 | 14.5 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | |
3 | $21 a Day Once a Month | 1941 | 7 | missing | 8.2 | 5 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 24.5 | 0.0 | 44.5 | 24.5 | 24.5 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | |
4 | $40,000 | 1996 | 70 | missing | 8.2 | 6 | 14.5 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 34.5 | 45.5 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | |
5 | $50,000 Climax Show, The | 1975 | 71 | missing | 3.4 | 17 | 24.5 | 4.5 | 0.0 | 14.5 | 14.5 | 4.5 | 0.0 | 0.0 | 0.0 | 24.5 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Let's use @glimpse()
to preview the dataset.
@glimpse(movies)
Rows: 58788
Columns: 24
.Title String $, $1000 a Touchdown, $21 a Day Once a Month, $40,
.Year Int32 1971, 1939, 1941, 1996, 1975, 2000, 2002, 2002, 19
.Length Int32 121, 71, 7, 70, 71, 91, 93, 25, 97, 61, 99, 96, 10
.Budget Union{Missing, Int32}missing, missing, missing, missing, missing,
.Rating Float64 6.4, 6.0, 8.2, 8.2, 3.4, 4.3, 5.3, 6.7, 6.6, 6.0,
.Votes Int32 348, 20, 5, 6, 17, 45, 200, 24, 18, 51, 23, 53, 44
.R1 Float64 4.5, 0.0, 0.0, 14.5, 24.5, 4.5, 4.5, 4.5, 4.5, 4.5
.R2 Float64 4.5, 14.5, 0.0, 0.0, 4.5, 4.5, 0.0, 4.5, 4.5, 0.0,
.R3 Float64 4.5, 4.5, 0.0, 0.0, 0.0, 4.5, 4.5, 4.5, 4.5, 4.5,
.R4 Float64 4.5, 24.5, 0.0, 0.0, 14.5, 14.5, 4.5, 4.5, 0.0, 4.
.R5 Float64 14.5, 14.5, 0.0, 0.0, 14.5, 14.5, 24.5, 4.5, 0.0,
.R6 Float64 24.5, 14.5, 24.5, 0.0, 4.5, 14.5, 24.5, 14.5, 0.0,
.R7 Float64 24.5, 14.5, 0.0, 0.0, 0.0, 4.5, 14.5, 14.5, 34.5,
.R8 Float64 14.5, 4.5, 44.5, 0.0, 0.0, 4.5, 4.5, 14.5, 14.5, 4
.R9 Float64 4.5, 4.5, 24.5, 34.5, 0.0, 14.5, 4.5, 4.5, 4.5, 4.
.R10 Float64 4.5, 14.5, 24.5, 45.5, 24.5, 14.5, 14.5, 14.5, 24.
.MPAA CategoricalArrays.CategoricalValue{String, UInt8}, , , , , , R, ,
.Action Int32 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0,
.Animation Int32 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
.Comedy Int32 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0,
.Drama Int32 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1,
.Documentary Int32 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
.Romance Int32 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
.Short Int32 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0,
This page was generated using Literate.jl.