Writing Functions with TidierDB Chains
On this page, we'll briefly explore how to use TidierDB macros and $
witth @eval
to bulid a function
For a more indepth explanation, please check out the TidierData page on interpolation
using TidierDB, DataFrames;
db = connect(duckdb());
df = DataFrame(id = [string('A' + i ÷ 26, 'A' + i % 26) for i in 0:9],
groups = [i % 2 == 0 ? "aa" : "bb" for i in 1:10],
value = repeat(1:5, 2),
percent = 0.1:0.1:1.0);
copy_to(db, df, "dfm");
df_mem = db_table(db, "dfm");
Interpolation¤
Variables are interpoated using @eval
and $
. Place @eval
before you begin the chain or call a TidierDb macro Why Use @eval? In Julia, macros like @filter are expanded at parse time, before runtime variables like vals are available. By using @eval, we force the expression to be evaluated at runtime, allowing us to interpolate the variable into the macro.
num = [3];
column = :id;
@eval @chain t(df_mem) begin
@filter(value in $num)
@select($column)
@collect
end
Row | id |
---|---|
String | |
1 | AC |
2 | AH |
Function set up¤
Begin by defining your function as your normally would, but before @chain
you need to use @eval
. For the variables to be interpolated in need to be started with $
function test(vals, cols)
@eval @chain t(df_mem) begin
@filter(value in $vals)
@select($cols)
@collect
end
end;
vals = [1, 2, 3, 3];
test(vals, [:groups, :value, :percent])
Row | groups | value | percent |
---|---|---|---|
String | Int64 | Float64 | |
1 | bb | 1 | 0.1 |
2 | aa | 1 | 0.6 |
3 | aa | 2 | 0.2 |
4 | bb | 2 | 0.7 |
5 | bb | 3 | 0.3 |
6 | aa | 3 | 0.8 |
Now with a new variable
other_vals = [1];
cols = [:value, :percent];
test(other_vals, cols)
Row | value | percent |
---|---|---|
Int64 | Float64 | |
1 | 1 | 0.1 |
2 | 1 | 0.6 |
Defineing a new function
function gs(groups, aggs, new_name, threshold)
@eval @chain t(df_mem) begin
@group_by($groups)
@summarize($new_name = mean($aggs))
@filter($new_name > $threshold)
@collect
end
end;
gs(:groups, :percent, :mean_percent, .5)
Row | groups | mean_percent |
---|---|---|
String | Float64 | |
1 | aa | 0.6 |
Change the column and threshold
gs(:groups, :value, :mean_value, 2)
Row | groups | mean_value |
---|---|---|
String | Float64 | |
1 | bb | 3.0 |
2 | aa | 3.0 |
Write pipeline function to use inside of chains¤
Lets say there is a particular sequence of macros that you want repeatedly use. Wrap this series into a function that accepts a t(query
as its first argument and returns a SQLquery
and you can easily resuse it.
function moving_aggs(table, start, stop, group, order, col)
qry = @eval @chain $table begin
@group_by $group
@window_frame $start $stop
@window_order $order
@mutate(across($col, (minimum, maximum, mean)))
end
return qry
end;
@chain t(df_mem) begin
moving_aggs(-2, 1, :groups, :percent, :value)
@filter value_mean > 2.75
@aside @show_query _
@collect
end
Row | id | groups | value | percent | value_minimum | value_maximum | value_mean |
---|---|---|---|---|---|---|---|
String | String | Int64 | Float64 | Int64 | Int64 | Float64 | |
1 | AB | aa | 2 | 0.2 | 2 | 4 | 3.0 |
2 | AH | aa | 3 | 0.8 | 1 | 5 | 3.25 |
3 | AJ | aa | 5 | 1.0 | 1 | 5 | 3.0 |
4 | AC | bb | 3 | 0.3 | 1 | 5 | 3.0 |
5 | AG | bb | 2 | 0.7 | 2 | 5 | 3.5 |
6 | AI | bb | 4 | 0.9 | 2 | 5 | 3.66667 |
Filtering before the window functions
@chain t(df_mem) begin
@filter(value >=2 )
moving_aggs(-1, 1, :groups, :percent, :value)
@aside @show_query _
@collect
end
Row | id | groups | value | percent | value_minimum | value_maximum | value_mean |
---|---|---|---|---|---|---|---|
String | String | Int64 | Float64 | Int64 | Int64 | Float64 | |
1 | AB | aa | 2 | 0.2 | 2 | 4 | 3.0 |
2 | AD | aa | 4 | 0.4 | 2 | 4 | 3.0 |
3 | AH | aa | 3 | 0.8 | 3 | 5 | 4.0 |
4 | AJ | aa | 5 | 1.0 | 3 | 5 | 4.0 |
5 | AC | bb | 3 | 0.3 | 3 | 5 | 4.0 |
6 | AE | bb | 5 | 0.5 | 2 | 5 | 3.33333 |
7 | AG | bb | 2 | 0.7 | 2 | 5 | 3.66667 |
8 | AI | bb | 4 | 0.9 | 2 | 4 | 3.0 |
Interpolating Queries¤
To use a prior, uncollected TidierDB query in other TidierDB macros, interpolate the needed query without showing or collecting it
ok = @chain t(df_mem) @summarize(mean = mean(value));
The mean value represented in SQL from the above is 3
With @filter
@eval @chain t(df_mem) begin
@filter( value > $ok)
@collect
end
Row | id | groups | value | percent |
---|---|---|---|---|
String | String | Int64 | Float64 | |
1 | AD | aa | 4 | 0.4 |
2 | AE | bb | 5 | 0.5 |
3 | AI | bb | 4 | 0.9 |
4 | AJ | aa | 5 | 1.0 |
With @mutate
@eval @chain t(df_mem) begin
@mutate(value2 = value + $ok)
@collect
end
Row | id | groups | value | percent | value2 |
---|---|---|---|---|---|
String | String | Int64 | Float64 | Float64 | |
1 | AA | bb | 1 | 0.1 | 4.0 |
2 | AB | aa | 2 | 0.2 | 5.0 |
3 | AC | bb | 3 | 0.3 | 6.0 |
4 | AD | aa | 4 | 0.4 | 7.0 |
5 | AE | bb | 5 | 0.5 | 8.0 |
6 | AF | aa | 1 | 0.6 | 4.0 |
7 | AG | bb | 2 | 0.7 | 5.0 |
8 | AH | aa | 3 | 0.8 | 6.0 |
9 | AI | bb | 4 | 0.9 | 7.0 |
10 | AJ | aa | 5 | 1.0 | 8.0 |
With @summarize
@eval @chain t(df_mem) begin
@summarize(value = mean(value) * $ok)
@collect
end
Row | value |
---|---|
Float64 | |
1 | 9.0 |
This page was generated using Literate.jl.