Setting Options
TidierData.jl comes with two settings that make it easier to understand the transformations that are being applied to a data frame and to troubleshoot errors. These settings are log
and code
. The log
setting outputs information about the data frame after each transformation, including the number of missing values and the number of unique values in each column. The code
setting outputs the code that is being executed by the TidierData.jl macros. By default, both settings are set to false
. This page will review the log
and code
settings using the movies dataset.
We recommend setting the log
setting to true
in general, and especially when you are first learning TidierData.jl. This will help you understand how the data frame is being transformed at each step. The code
setting is useful for debugging errors in TidierData.jl chains.
using TidierData
using RDatasets
movies = dataset("ggplot2", "movies");
log
¤
Logging is set to false
by default but can enabled as follows:
TidierData_set("log", true)
true
When enabled, each macro called will show information about its transformation of the data. Logging can be especially useful to catch silent bugs (those that do not result in an error).
When column values are changed, it will report the number new missing values, the percentage of missing values, and the number of unique values.
@chain movies begin
@filter(Year > 2000)
@mutate(Budget_cat = case_when(Budget > 18000 => "high",
Budget > 2000 => "medium",
Budget > 100 => "low",
true => missing))
@filter(!ismissing(Budget))
@group_by(Year, Budget_cat)
@summarize(Avg_Budget = mean(Budget), n = n())
@ungroup
@arrange(n)
end
Row | Year | Budget_cat | Avg_Budget | n |
---|---|---|---|---|
Int32 | String? | Float64 | Int64 | |
1 | 2005 | low | 2000.0 | 1 |
2 | 2002 | missing | 0.0 | 2 |
3 | 2001 | missing | 0.0 | 3 |
4 | 2001 | low | 1425.0 | 4 |
5 | 2005 | missing | 0.0 | 4 |
6 | 2004 | missing | 0.0 | 4 |
7 | 2002 | low | 1500.0 | 7 |
8 | 2003 | missing | 0.0 | 7 |
9 | 2005 | medium | 8249.94 | 16 |
10 | 2003 | low | 1443.48 | 23 |
11 | 2001 | medium | 9580.0 | 25 |
12 | 2004 | low | 1308.0 | 25 |
13 | 2002 | medium | 7815.33 | 30 |
14 | 2003 | medium | 8027.28 | 67 |
15 | 2005 | high | 2.0684e7 | 82 |
16 | 2004 | medium | 7946.05 | 91 |
17 | 2003 | high | 2.14431e7 | 276 |
18 | 2001 | high | 2.13646e7 | 289 |
19 | 2002 | high | 2.17604e7 | 320 |
20 | 2004 | high | 1.88698e7 | 336 |
Logging can also be disabled.
TidierData_set("log", false) # disable logging
false
code
¤
Code printing is set to false
by default. Enabling this setting prints the underlying DataFrames.jl code created by TidierData.jl macros. It can be useful for debugging, especially for users who understand DataFrames.jl syntax, or for filing bug reports.
TidierData_set("code", true) # enable macro code output
@chain movies begin
@select(Title, Year, Budget)
@slice_sample(n = 10)
end
Row | Title | Year | Budget |
---|---|---|---|
String | Int32 | Int32? | |
1 | Dangerous Davies - The Last Detective | 1981 | missing |
2 | Stolen Heart | 1998 | missing |
3 | Don't Go Breaking My Heart | 1999 | missing |
4 | Transylvania 6-5000 | 1985 | missing |
5 | Amores | 1998 | missing |
6 | Polar | 1984 | missing |
7 | Angel | 1966 | missing |
8 | Criminally Insane 2 | 1987 | missing |
9 | Burgtheater | 1936 | missing |
10 | Carrier, The | 1988 | missing |
Code printing can also be disabled.
TidierData_set("code", false) # disable macro code output
false
This page was generated using Literate.jl.