Skip to content

Setting Options

TidierData.jl comes with two settings that make it easier to understand the transformations that are being applied to a data frame and to troubleshoot errors. These settings are log and code. The log setting outputs information about the data frame after each transformation, including the number of missing values and the number of unique values in each column. The code setting outputs the code that is being executed by the TidierData.jl macros. By default, both settings are set to false. This page will review the log and code settings using the movies dataset.

We recommend setting the log setting to true in general, and especially when you are first learning TidierData.jl. This will help you understand how the data frame is being transformed at each step. The code setting is useful for debugging errors in TidierData.jl chains.

using TidierData
using RDatasets

movies = dataset("ggplot2", "movies");

log¤

Logging is set to false by default but can enabled as follows:

TidierData_set("log", true)
true

When enabled, each macro called will show information about its transformation of the data. Logging can be especially useful to catch silent bugs (those that do not result in an error).

When column values are changed, it will report the number new missing values, the percentage of missing values, and the number of unique values.

@chain movies begin
    @filter(Year > 2000)
    @mutate(Budget_cat = case_when(Budget > 18000 => "high",
                                   Budget > 2000  => "medium",
                                   Budget > 100 => "low",
                                    true => missing))
    @filter(!ismissing(Budget))
    @group_by(Year, Budget_cat)
    @summarize(Avg_Budget = mean(Budget), n = n())
    @ungroup
    @arrange(n)
end
20×4 DataFrame
RowYearBudget_catAvg_Budgetn
Int32String?Float64Int64
12005low2000.01
22002missing0.02
32001missing0.03
42001low1425.04
52005missing0.04
62004missing0.04
72002low1500.07
82003missing0.07
92005medium8249.9416
102003low1443.4823
112001medium9580.025
122004low1308.025
132002medium7815.3330
142003medium8027.2867
152005high2.0684e782
162004medium7946.0591
172003high2.14431e7276
182001high2.13646e7289
192002high2.17604e7320
202004high1.88698e7336

Logging can also be disabled.

TidierData_set("log", false) # disable logging
false

code¤

Code printing is set to false by default. Enabling this setting prints the underlying DataFrames.jl code created by TidierData.jl macros. It can be useful for debugging, especially for users who understand DataFrames.jl syntax, or for filing bug reports.

TidierData_set("code", true) # enable macro code output

@chain movies begin
    @select(Title, Year, Budget)
    @slice_sample(n = 10)
end
10×3 DataFrame
RowTitleYearBudget
StringInt32Int32?
1San Sebastian 1746 in 19681968missing
2Road Agent1952missing
3Best of Walt Disney's True-Life Adventures, The1975missing
414, The1973missing
5This Time for Keeps1947missing
6David e Golia1960missing
7Guerra del cerdo, La1975missing
8Schlafes Bruder1995missing
9Mating Season, The1951missing
10Captain's Christmas, The1938missing

Code printing can also be disabled.

TidierData_set("code", false) # disable macro code output
false

This page was generated using Literate.jl.