Skip to content

Interpolation

The !! ("bang bang") operator can be used to interpolate values of variables from the global environment into your code. This operator is borrowed from the R rlang package. At some point, we may switch to using native Julia interpolation, but for a variety of reasons that introduce some complexity with native interpolation, we plan to continue to support !! interpolation.

To interpolate multiple variables, the rlang R package uses the !!! "triple bang" operator. However, in Tidier.jl, the !! "bang bang" operator can be used to interpolate either single or multiple values as shown in the examples below.

Since the !! operator can only access variables in the global environment, we will set these variables in a somewhat roundabout way for the purposes of documentation. However, in interactive use, you can simply write myvar = :b instead of wrapping this code inside of an @eval() macro as is done here.

Note: myvar = :b, myvar = (:a, :b), and myvar = [:a, :b] all refer to columns with those names. On the other hand, myvar = "b", myvar = ("a", "b") and myvar = ["a", "b"] will interpolate those values. See beloe for examples.

using Tidier
using RDatasets

df = DataFrame(a = string.(repeat('a':'e', inner = 2)),
               b = [1,1,1,2,2,2,3,3,3,4],
               c = 11:20)
10×3 DataFrame
Rowabc
StringInt64Int64
1a111
2a112
3b113
4b214
5c215
6c216
7d317
8d318
9e319
10e420

Select the column (because myvar contains a symbol)¤

@eval(Main, myvar = :b)

@chain df begin
  @select(!!myvar)
end
10×1 DataFrame
Rowb
Int64
11
21
31
42
52
62
73
83
93
104

Select multiple variables (tuple of symbols)¤

@eval(Main, myvars_tuple = (:a, :b))

@chain df begin
  @select(!!myvars_tuple)
end
10×2 DataFrame
Rowab
StringInt64
1a1
2a1
3b1
4b2
5c2
6c2
7d3
8d3
9e3
10e4

Select multiple variables (vector of symbols)¤

@eval(Main, myvars_vector = [:a, :b])

@chain df begin
  @select(!!myvars_vector)
end
10×2 DataFrame
Rowab
StringInt64
1a1
2a1
3b1
4b2
5c2
6c2
7d3
8d3
9e3
10e4

Filter rows containing the value of myvar_string (because myvar_string does)¤

@eval(Main, myvar_string = "b")

@chain df begin
  @filter(a == !!myvar_string)
end
2×3 DataFrame
Rowabc
StringInt64Int64
1b113
2b214

Filetering rows works similarly using in.¤

Note that for in to work here, we have to wrap it in [] because otherwise, the string will be converted into a collection of characters, which are a different data type.

@eval(Main, myvar_string = "b")

@chain df begin
  @filter(a in [!!myvar_string])
end
2×3 DataFrame
Rowabc
StringInt64Int64
1b113
2b214

You can also use this for a tuple or vector of strings.¤

@eval(Main, myvars_string = ("a", "b"))

@chain df begin
  @filter(a in !!myvars_string)
end
4×3 DataFrame
Rowabc
StringInt64Int64
1a111
2a112
3b113
4b214

Mutate one variable¤

@eval(Main, myvar = :b)

@chain df begin
  @mutate(!!myvar = !!myvar + 1)
end
10×3 DataFrame
Rowabc
StringInt64Int64
1a211
2a212
3b213
4b314
5c315
6c316
7d417
8d418
9e419
10e520

Summarize across one variable¤

@eval(Main, myvar = :b)

@chain df begin
  @summarize(across(!!myvar, mean))
end
1×1 DataFrame
Rowb_mean
Float64
12.2

Summarize across multiple variables¤

@eval(Main, myvars_tuple = (:b, :c))

@chain df begin
  @summarize(across(!!myvars_tuple, (mean, minimum, maximum)))
end
1×6 DataFrame
Rowb_meanc_meanb_minimumc_minimumb_maximumc_maximum
Float64Float64Int64Int64Int64Int64
12.215.5111420

Group by multiple interpolated variables¤

@eval(Main, myvars_tuple = (:a, :b))

@chain df begin
  @group_by(!!myvars_tuple)
  @summarize(c = mean(c))
end

GroupedDataFrame with 5 groups based on key: a

First Group (1 row): a = "a"
Rowabc
StringInt64Float64
1a111.5

Last Group (2 rows): a = "e"
Rowabc
StringInt64Float64
1e319.0
2e420.0

Global constants¤

Because global constants like pi exist in the Main module, they can also be accessed using interpolation. For example, let's calculate the area of circles with a radius of 1 up to 5.

df = DataFrame(radius = 1:5)
5×1 DataFrame
Rowradius
Int64
11
22
33
44
55

We can interpolate pi (from the Main module) to help with this.

@chain df begin
  @mutate(area = !!pi * radius^2)
end
5×2 DataFrame
Rowradiusarea
Int64Float64
113.14159
2212.5664
3328.2743
4450.2655
5578.5398

Alternative interpolation syntax¤

While interpolationg using !! is concise and handy, it's not required. You can also access user-defined globals and global constant variables using the following syntax:

@chain df begin
  @mutate(area = Main.pi * radius^2)
end
5×2 DataFrame
Rowradiusarea
Int64Float64
113.14159
2212.5664
3328.2743
4450.2655
5578.5398

The key lesson with interpolation is that any bare unquoted variable is assumed to refer to a column name in the DataFrame. If you are referring to any variable outside of the DataFrame, you need to either use !!variable or Main.variable syntax to refer to this variable.


This page was generated using Literate.jl.