Skip to content

Writing Functions with TidierDB Chains

On this page, we'll briefly explore how to use TidierDB macros and $ witth @eval to bulid a function

For a more indepth explanation, please check out the TidierData page on interpolation

using TidierDB, DataFrames;

db = connect(duckdb());
df = DataFrame(id = [string('A' + i ÷ 26, 'A' + i % 26) for i in 0:9],
                        groups = [i % 2 == 0 ? "aa" : "bb" for i in 1:10],
                        value = repeat(1:5, 2),
                        percent = 0.1:0.1:1.0);
copy_to(db, df, "dfm");
df_mem = db_table(db, "dfm");

Interpolation¤

Variables are interpoated using @eval and $. Place @eval before you begin the chain or call a TidierDb macro Why Use @eval? In Julia, macros like @filter are expanded at parse time, before runtime variables like vals are available. By using @eval, we force the expression to be evaluated at runtime, allowing us to interpolate the variable into the macro.

num = [3];
column = :id;
@eval @chain t(df_mem) begin
        @filter(value in $num)
        @select($column)
        @collect
    end
2×1 DataFrame
Rowid
String
1AC
2AH

Function set up¤

Begin by defining your function as your normally would, but before @chain you need to use @eval. For the variables to be interpolated in need to be started with $

function test(vals, cols)
    @eval @chain t(df_mem) begin
        @filter(value in $vals)
        @select($cols)
        @collect
    end
end;

vals = [1,  2,  3, 3];
test(vals, [:groups, :value, :percent])
6×3 DataFrame
Rowgroupsvaluepercent
StringInt64Float64
1bb10.1
2aa10.6
3aa20.2
4bb20.7
5bb30.3
6aa30.8

Now with a new variable

other_vals = [1];
cols = [:value, :percent];
test(other_vals, cols)
2×2 DataFrame
Rowvaluepercent
Int64Float64
110.1
210.6

Defineing a new function

function gs(groups, aggs, new_name, threshold)
    @eval @chain t(df_mem) begin
        @group_by($groups)
        @summarize($new_name = mean($aggs))
        @filter($new_name > $threshold)
        @collect
    end
end;

gs(:groups, :percent, :mean_percent, .5)
1×2 DataFrame
Rowgroupsmean_percent
StringFloat64
1aa0.6

Change the column and threshold

gs(:groups, :value, :mean_value, 2)
2×2 DataFrame
Rowgroupsmean_value
StringFloat64
1bb3.0
2aa3.0

Write pipeline function to use inside of chains¤

Lets say there is a particular sequence of macros that you want repeatedly use. Wrap this series into a function that accepts a t(query as its first argument and returns a SQLquery and you can easily resuse it.

function moving_aggs(table, start, stop, group, order, col)
    qry = @eval @chain $table begin
        @group_by $group
        @window_frame $start $stop
        @window_order $order
        @mutate(across($col, (minimum, maximum, mean)))
    end
    return qry
end;

@chain t(df_mem) begin
    moving_aggs(-2, 1, :groups, :percent, :value)
    @filter value_mean > 2.75
    @aside @show_query _
    @collect
end
6×7 DataFrame
Rowidgroupsvaluepercentvalue_minimumvalue_maximumvalue_mean
StringStringInt64Float64Int64Int64Float64
1ABaa20.2243.0
2AHaa30.8153.25
3AJaa51.0153.0
4ACbb30.3153.0
5AGbb20.7253.5
6AIbb40.9253.66667

Filtering before the window functions

@chain t(df_mem) begin
    @filter(value >=2 )
    moving_aggs(-1, 1, :groups, :percent, :value)
    @aside @show_query _
    @collect
end
8×7 DataFrame
Rowidgroupsvaluepercentvalue_minimumvalue_maximumvalue_mean
StringStringInt64Float64Int64Int64Float64
1ABaa20.2243.0
2ADaa40.4243.0
3AHaa30.8354.0
4AJaa51.0354.0
5ACbb30.3354.0
6AEbb50.5253.33333
7AGbb20.7253.66667
8AIbb40.9243.0

Interpolating Queries¤

To use a prior, uncollected TidierDB query in other TidierDB macros, interpolate the needed query without showing or collecting it

ok = @chain t(df_mem) @summarize(mean = mean(value));

The mean value represented in SQL from the above is 3

With @filter

@eval @chain t(df_mem) begin
    @filter( value > $ok)
    @collect
end
4×4 DataFrame
Rowidgroupsvaluepercent
StringStringInt64Float64
1ADaa40.4
2AEbb50.5
3AIbb40.9
4AJaa51.0

With @mutate

@eval @chain t(df_mem) begin
    @mutate(value2 =  value + $ok)
    @collect
end
10×5 DataFrame
Rowidgroupsvaluepercentvalue2
StringStringInt64Float64Float64
1AAbb10.14.0
2ABaa20.25.0
3ACbb30.36.0
4ADaa40.47.0
5AEbb50.58.0
6AFaa10.64.0
7AGbb20.75.0
8AHaa30.86.0
9AIbb40.97.0
10AJaa51.08.0

With @summarize

@eval @chain t(df_mem) begin
    @summarize(value =  mean(value) * $ok)
    @collect
end
1×1 DataFrame
Rowvalue
Float64
19.0

This page was generated using Literate.jl.