Column names
When referring to column names, TidierData.jl is a bit unusual for a Julia package in that it does not use symbols. This is because TidierData.jl uses tidy expressions, which in R lingo equates to a style of programming referred to as "non-standard evaluation." If you are creating a new column a
containing a value that is the mean of column b
, you would simply write a = mean(b)
.
However, there may be times when you wish to create or refer to a column containing a space in it. Let's start by creating some column names containing a space in their name.
using TidierData
df = DataFrame(var"my name" = ["Ada", "Twist"],
var"my age" = [40, 50])
Row | my name | my age |
---|---|---|
String | Int64 | |
1 | Ada | 40 |
2 | Twist | 50 |
To create a column name containing a space, we used the var"column name"
notation. Because DataFrame()
is a regular Julia function, this is the standard way to refer to a variable containing a space, which is why we need to use this here.
This notation also works inside of TidierData.jl.
var"column name"
notation¤
If we want to figure out the age for the people in our dataset a decade from today, we could use this same var"column name"
notation inside of @mutate
.
@chain df begin
@mutate(var"age in 10 years" = var"my age" + 10)
end
Row | my name | my age | age in 10 years |
---|---|---|---|
String | Int64 | Int64 | |
1 | Ada | 40 | 50 |
2 | Twist | 50 | 60 |
However, typing out the var"column name"
can become cumbersome. TidierData.jl also supports another shorthand notation to refer to column names containing spaces or other special characters: backticks.
Backtick notation¤
This same code could be written more concisely like this:
@chain df begin
@mutate(`age in 10 years` = `my age` + 10)
end
Row | my name | my age | age in 10 years |
---|---|---|---|
String | Int64 | Int64 | |
1 | Ada | 40 | 50 |
2 | Twist | 50 | 60 |
Backticks are an R convention. While they are not specific to tidyverse, they are a convenient way to refer to column names that otherwise would not parse correctly as a single entity. Backticks are supported in all TidierData.jl functions where column names may be referenced.
Cleaning up column names¤
Another option is to clean up the column names so that you do not have spaces to begin with. In R, this is usually accomplished using the janitor
package. In Julia, the Cleaner.jl package provides this functionality, which we have wrapped inside of TidierData.jl.
@chain df begin
@clean_names
end
Row | my_name | my_age |
---|---|---|
String | Int64 | |
1 | Ada | 40 |
2 | Twist | 50 |
Although the default value for the case
argument is "snake_case", you can also set this to "camelCase".
@chain df begin
@clean_names(case = "camelCase")
end
Row | myName | myAge |
---|---|---|
String | Int64 | |
1 | Ada | 40 |
2 | Twist | 50 |
This page was generated using Literate.jl.