Movies dataset

To get started, we will load the movies dataset from the RDatasets.jl package.

using Tidier
using RDatasets

movies = dataset("ggplot2", "movies");

To work with this dataset, we will use the @chain macro. This macro initiates a pipe, and every function or macro provided to it between the begin and end blocks modifies the dataframe mentioned at the beginning of the pipe. You don't have to necessarily spread a chain over multiple lines of code, but when working with data frames it's often easiest to do so. Before going futher, take a look at the Chain.jl GitHub page to see all the cool things that are possible with this, including mid-chain side effects using @aside and mid-chain assignment of variables.

Let's take a look at the first 5 rows of the movies dataset using @slice().

@chain movies begin
    @slice(1:5)
end

5×24 DataFrame

Row	Title	Year	Length	Budget	Rating	Votes	R1	R2	R3	R4	R5	R6	R7	R8	R9	R10	MPAA	Action	Animation	Comedy	Drama	Documentary	Romance	Short
	String	Int32	Int32	Int32?	Float64	Int32	Float64	Float64	Float64	Float64	Float64	Float64	Float64	Float64	Float64	Float64	Cat…	Int32	Int32	Int32	Int32	Int32	Int32	Int32
1	$	1971	121	missing	6.4	348	4.5	4.5	4.5	4.5	14.5	24.5	24.5	14.5	4.5	4.5		0	0	1	1	0	0	0
2	$1000 a Touchdown	1939	71	missing	6.0	20	0.0	14.5	4.5	24.5	14.5	14.5	14.5	4.5	4.5	14.5		0	0	1	0	0	0	0
3	$21 a Day Once a Month	1941	7	missing	8.2	5	0.0	0.0	0.0	0.0	0.0	24.5	0.0	44.5	24.5	24.5		0	1	0	0	0	0	1
4	$40,000	1996	70	missing	8.2	6	14.5	0.0	0.0	0.0	0.0	0.0	0.0	0.0	34.5	45.5		0	0	1	0	0	0	0
5	$50,000 Climax Show, The	1975	71	missing	3.4	17	24.5	4.5	0.0	14.5	14.5	4.5	0.0	0.0	0.0	24.5		0	0	0	0	0	0	0

Let's use the describe() function, which is re-exported from the DataFrames.jl package to describe the dataset.

describe(movies)

24×7 DataFrame

Row	variable	mean	min	median	max	nmissing	eltype
	Symbol	Union…	Any	Union…	Any	Int64	Type
1	Title		$		xXx: State of the Union	0	String
2	Year	1976.13	1893	1983.0	2005	0	Int32
3	Length	82.3379	1	90.0	5220	0	Int32
4	Budget	1.34125e7	0	3.0e6	200000000	53573	Union{Missing, Int32}
5	Rating	5.93285	1.0	6.1	10.0	0	Float64
6	Votes	632.13	5	30.0	157608	0	Int32
7	R1	7.01438	0.0	4.5	100.0	0	Float64
8	R2	4.02238	0.0	4.5	84.5	0	Float64
9	R3	4.72116	0.0	4.5	84.5	0	Float64
10	R4	6.37485	0.0	4.5	100.0	0	Float64
11	R5	9.79669	0.0	4.5	100.0	0	Float64
12	R6	13.0392	0.0	14.5	84.5	0	Float64
13	R7	15.5481	0.0	14.5	100.0	0	Float64
14	R8	13.876	0.0	14.5	100.0	0	Float64
15	R9	8.95421	0.0	4.5	100.0	0	Float64
16	R10	16.854	0.0	14.5	100.0	0	Float64
17	MPAA				R	0	CategoricalValue{String, UInt8}
18	Action	0.0797442	0	0.0	1	0	Int32
19	Animation	0.0627679	0	0.0	1	0	Int32
20	Comedy	0.293784	0	0.0	1	0	Int32
21	Drama	0.371011	0	0.0	1	0	Int32
22	Documentary	0.0590597	0	0.0	1	0	Int32
23	Romance	0.0806967	0	0.0	1	0	Int32
24	Short	0.160883	0	0.0	1	0	Int32

This page was generated using Literate.jl.