Writing Functions with TidierDB Chains
On this page, we'll briefly explore how to use TidierDB macros and $ witth @eval to bulid a function
For a more indepth explanation, please check out the TidierData page on interpolation
using TidierDB, DataFrames;
db = connect(duckdb());
df = DataFrame(id = [string('A' + i ÷ 26, 'A' + i % 26) for i in 0:9],
groups = [i % 2 == 0 ? "aa" : "bb" for i in 1:10],
value = repeat(1:5, 2),
percent = 0.1:0.1:1.0);
dfv = dt(db, df, "dfm");
Interpolation¤
Variables are interpoated using @eval and $. Place @eval before you begin the chain or call a TidierDb macro Why Use @eval? In Julia, macros like @filter are expanded at parse time, before runtime variables like vals are available. By using @eval, we force the expression to be evaluated at runtime, allowing us to interpolate the variable into the macro.
num = [3];
column = :id;
@eval @chain dfv begin
@filter(value in $num)
@select($column)
@collect
end
| Row | id |
|---|---|
| String | |
| 1 | AC |
| 2 | AH |
Function set up¤
Begin by defining your function as your normally would, but before @chain you need to use @eval. For the variables to be interpolated in need to be started with $
function test(vals, cols)
@eval @chain dfv begin
@filter(value in $vals)
@select($cols)
@collect
end
end;
vals = [1, 2, 3, 3];
test(vals, [:groups, :value, :percent])
| Row | groups | value | percent |
|---|---|---|---|
| String | Int64 | Float64 | |
| 1 | bb | 1 | 0.1 |
| 2 | aa | 2 | 0.2 |
| 3 | bb | 3 | 0.3 |
| 4 | aa | 1 | 0.6 |
| 5 | bb | 2 | 0.7 |
| 6 | aa | 3 | 0.8 |
Now with a new variable
other_vals = [1];
cols = [:value, :percent];
test(other_vals, cols)
| Row | value | percent |
|---|---|---|
| Int64 | Float64 | |
| 1 | 1 | 0.1 |
| 2 | 1 | 0.6 |
Defineing a new function
function gs(groups, aggs, new_name, threshold)
@eval @chain dfv begin
@group_by($groups)
@summarize($new_name = mean($aggs))
@filter($new_name > $threshold)
@collect
end
end;
gs(:groups, :percent, :mean_percent, .5)
| Row | groups | mean_percent |
|---|---|---|
| String | Float64 | |
| 1 | aa | 0.6 |
Change the column and threshold
gs(:groups, :value, :mean_value, 2)
| Row | groups | mean_value |
|---|---|---|
| String | Float64 | |
| 1 | bb | 3.0 |
| 2 | aa | 3.0 |
Write pipeline function to use inside of chains¤
Lets say there is a particular sequence of macros that you want repeatedly use. Wrap this series into a function that accepts a t(query as its first argument and returns a SQLquery and you can easily resuse it.
function moving_aggs(table, start, stop, group, order, col)
qry = @eval @chain $table begin
@group_by $group
@window_frame $start $stop
@window_order $order
@mutate(across($col, (minimum, maximum, mean)))
end
return qry
end;
@chain dfv begin
moving_aggs(-2, 1, :groups, :percent, :value)
@filter value_mean > 2.75
@aside @show_query _
@collect
end
| Row | id | groups | value | percent | value_minimum | value_maximum | value_mean |
|---|---|---|---|---|---|---|---|
| String | String | Int64 | Float64 | Int64 | Int64 | Float64 | |
| 1 | AC | bb | 3 | 0.3 | 1 | 5 | 3.0 |
| 2 | AG | bb | 2 | 0.7 | 2 | 5 | 3.5 |
| 3 | AI | bb | 4 | 0.9 | 2 | 5 | 3.66667 |
| 4 | AB | aa | 2 | 0.2 | 2 | 4 | 3.0 |
| 5 | AH | aa | 3 | 0.8 | 1 | 5 | 3.25 |
| 6 | AJ | aa | 5 | 1.0 | 1 | 5 | 3.0 |
Filtering before the window functions
@chain dfv begin
@filter(value >=2 )
moving_aggs(-1, 1, :groups, :percent, :value)
@aside @show_query _
@collect
end
| Row | id | groups | value | percent | value_minimum | value_maximum | value_mean |
|---|---|---|---|---|---|---|---|
| String | String | Int64 | Float64 | Int64 | Int64 | Float64 | |
| 1 | AC | bb | 3 | 0.3 | 3 | 5 | 4.0 |
| 2 | AE | bb | 5 | 0.5 | 2 | 5 | 3.33333 |
| 3 | AG | bb | 2 | 0.7 | 2 | 5 | 3.66667 |
| 4 | AI | bb | 4 | 0.9 | 2 | 4 | 3.0 |
| 5 | AB | aa | 2 | 0.2 | 2 | 4 | 3.0 |
| 6 | AD | aa | 4 | 0.4 | 2 | 4 | 3.0 |
| 7 | AH | aa | 3 | 0.8 | 3 | 5 | 4.0 |
| 8 | AJ | aa | 5 | 1.0 | 3 | 5 | 4.0 |
Interpolating Queries¤
To use a prior, uncollected TidierDB query in other TidierDB macros, interpolate the needed query without showing or collecting it
ok = @chain dfv @summarize(mean = mean(value));
The mean value represented in SQL from the above is 3
With @filter
@eval @chain dfv begin
@filter( value > $ok)
@collect
end
| Row | id | groups | value | percent |
|---|---|---|---|---|
| String | String | Int64 | Float64 | |
| 1 | AD | aa | 4 | 0.4 |
| 2 | AE | bb | 5 | 0.5 |
| 3 | AI | bb | 4 | 0.9 |
| 4 | AJ | aa | 5 | 1.0 |
With @mutate
@eval @chain dfv begin
@mutate(value2 = value + $ok)
@collect
end
| Row | id | groups | value | percent | value2 |
|---|---|---|---|---|---|
| String | String | Int64 | Float64 | Float64 | |
| 1 | AA | bb | 1 | 0.1 | 4.0 |
| 2 | AB | aa | 2 | 0.2 | 5.0 |
| 3 | AC | bb | 3 | 0.3 | 6.0 |
| 4 | AD | aa | 4 | 0.4 | 7.0 |
| 5 | AE | bb | 5 | 0.5 | 8.0 |
| 6 | AF | aa | 1 | 0.6 | 4.0 |
| 7 | AG | bb | 2 | 0.7 | 5.0 |
| 8 | AH | aa | 3 | 0.8 | 6.0 |
| 9 | AI | bb | 4 | 0.9 | 7.0 |
| 10 | AJ | aa | 5 | 1.0 | 8.0 |
With @summarize
@eval @chain dfv begin
@summarize(value = mean(value) * $ok)
@collect
end
| Row | value |
|---|---|
| Float64 | |
| 1 | 9.0 |
This page was generated using Literate.jl.