Movies dataset

To get started, we will load the movies dataset from the RDatasets.jl package.

using Tidier
using RDatasets

movies = dataset("ggplot2", "movies");

To work with this dataset, we will use the @chain macro. This macro initiates a pipe, and every function or macro provided to it between the begin and end blocks modifies the dataframe mentioned at the beginning of the pipe. You don't have to necessarily spread a chain over multiple lines of code, but when working with data frames it's often easiest to do so. Before going futher, take a look at the Chain.jl GitHub page to see all the cool things that are possible with this, including mid-chain side effects using @aside and mid-chain assignment of variables.

Let's take a look at the first 5 rows of the movies dataset using @slice().

@chain movies begin
    @slice(1:5)
end
5×24 DataFrame
RowTitleYearLengthBudgetRatingVotesR1R2R3R4R5R6R7R8R9R10MPAAActionAnimationComedyDramaDocumentaryRomanceShort
StringInt32Int32Int32?Float64Int32Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Cat…Int32Int32Int32Int32Int32Int32Int32
1$1971121missing6.43484.54.54.54.514.524.524.514.54.54.50011000
2$1000 a Touchdown193971missing6.0200.014.54.524.514.514.514.54.54.514.50010000
3$21 a Day Once a Month19417missing8.250.00.00.00.00.024.50.044.524.524.50100001
4$40,000199670missing8.2614.50.00.00.00.00.00.00.034.545.50010000
5$50,000 Climax Show, The197571missing3.41724.54.50.014.514.54.50.00.00.024.50000000

Let's use the describe() function, which is re-exported from the DataFrames.jl package to describe the dataset.

describe(movies)
24×7 DataFrame
Rowvariablemeanminmedianmaxnmissingeltype
SymbolUnion…AnyUnion…AnyInt64Type
1Title$xXx: State of the Union0String
2Year1976.1318931983.020050Int32
3Length82.3379190.052200Int32
4Budget1.34125e703.0e620000000053573Union{Missing, Int32}
5Rating5.932851.06.110.00Float64
6Votes632.13530.01576080Int32
7R17.014380.04.5100.00Float64
8R24.022380.04.584.50Float64
9R34.721160.04.584.50Float64
10R46.374850.04.5100.00Float64
11R59.796690.04.5100.00Float64
12R613.03920.014.584.50Float64
13R715.54810.014.5100.00Float64
14R813.8760.014.5100.00Float64
15R98.954210.04.5100.00Float64
16R1016.8540.014.5100.00Float64
17MPAAR0CategoricalValue{String, UInt8}
18Action0.079744200.010Int32
19Animation0.062767900.010Int32
20Comedy0.29378400.010Int32
21Drama0.37101100.010Int32
22Documentary0.059059700.010Int32
23Romance0.080696700.010Int32
24Short0.16088300.010Int32

This page was generated using Literate.jl.