Skip to content

Setting Options

TidierData.jl comes with two settings that make it easier to understand the transformations that are being applied to a data frame and to troubleshoot errors. These settings are log and code. The log setting outputs information about the data frame after each transformation, including the number of missing values and the number of unique values in each column. The code setting outputs the code that is being executed by the TidierData.jl macros. By default, both settings are set to false. This page will review the log and code settings using the movies dataset.

We recommend setting the log setting to true in general, and especially when you are first learning TidierData.jl. This will help you understand how the data frame is being transformed at each step. The code setting is useful for debugging errors in TidierData.jl chains.

using TidierData
using RDatasets

movies = dataset("ggplot2", "movies");

log¤

Logging is set to false by default but can enabled as follows:

TidierData_set("log", true)
true

When enabled, each macro called will show information about its transformation of the data. Logging can be especially useful to catch silent bugs (those that do not result in an error).

When column values are changed, it will report the number new missing values, the percentage of missing values, and the number of unique values.

@chain movies begin
    @filter(Year > 2000)
    @mutate(Budget_cat = case_when(Budget > 18000 => "high",
                                   Budget > 2000  => "medium",
                                   Budget > 100 => "low",
                                    true => missing))
    @filter(!ismissing(Budget))
    @group_by(Year, Budget_cat)
    @summarize(Avg_Budget = mean(Budget), n = n())
    @ungroup
    @arrange(n)
end
20×4 DataFrame
RowYearBudget_catAvg_Budgetn
Int32String?Float64Int64
12005low2000.01
22002missing0.02
32001missing0.03
42001low1425.04
52005missing0.04
62004missing0.04
72002low1500.07
82003missing0.07
92005medium8249.9416
102003low1443.4823
112001medium9580.025
122004low1308.025
132002medium7815.3330
142003medium8027.2867
152005high2.0684e782
162004medium7946.0591
172003high2.14431e7276
182001high2.13646e7289
192002high2.17604e7320
202004high1.88698e7336

Logging can also be disabled.

TidierData_set("log", false) # disable logging
false

code¤

Code printing is set to false by default. Enabling this setting prints the underlying DataFrames.jl code created by TidierData.jl macros. It can be useful for debugging, especially for users who understand DataFrames.jl syntax, or for filing bug reports.

TidierData_set("code", true) # enable macro code output

@chain movies begin
    @select(Title, Year, Budget)
    @slice_sample(n = 10)
end
10×3 DataFrame
RowTitleYearBudget
StringInt32Int32?
1Dangerous Davies - The Last Detective1981missing
2Stolen Heart1998missing
3Don't Go Breaking My Heart1999missing
4Transylvania 6-50001985missing
5Amores1998missing
6Polar1984missing
7Angel1966missing
8Criminally Insane 21987missing
9Burgtheater1936missing
10Carrier, The1988missing

Code printing can also be disabled.

TidierData_set("code", false) # disable macro code output
false

This page was generated using Literate.jl.