Skip to content

Flexible Syntax and UDFs

TidierDB is unique in its statement parsing flexiblility. This means that in addition to using any built in SQL database functions, user defined functions (or UDFS) are readily avaialable in TidierDB.

using TidierDB # DuckDB is reexported by TidierDB
db = connect(duckdb());
df = DataFrame(a = [1, 2, 3], b = [1, 2, 3]);
dfv = dt(db, df, "df_view");

UDFs in DuckDB¤

Once created, UDFs can immediately be used in with @mutate or @transmute A more in depth disccusion of UDFs in DuckDB.jl can be found here. There are 3 steps 1) Define a function in julia, 2) create the scalar function in DuckDB, and 3) register it

bino = (a, b) -> (a + b) * (a + b)
fun = DuckDB.@create_scalar_function bino(a::Int, b::Int)::Int
DuckDB.register_scalar_function(db, fun)
@chain t(dfv) @mutate(c = bino(a, b)) @collect
3×3 DataFrame
Rowabc
Int64Int64Int64
1114
22216
33336

Notably, when the function is redefined (with the same arguments) in julia, the DuckDB UDF representation will change as well.

bino = (a, b) -> (a + b) * (a - b)
@chain t(dfv) @mutate(c = bino(a, b)) @collect
3×3 DataFrame
Rowabc
Int64Int64Int64
1110
2220
3330

DuckDB function chaining¤

In DuckDB, functions can be chained together with .. TidierDB lets you leverage this.

mtcars_path = "https://gist.githubusercontent.com/seankross/a412dfbd88b3db70b74b/raw/5f23f993cd87c283ce766e7ac6b329ee7cc2e1d1/mtcars.csv";
mtcars = dt(db, mtcars_path);
@chain t(mtcars) begin
    @mutate(model2 = model.upper().string_split(" ").list_aggr("string_agg",".").concat("."))
    @select model model2
    @head()
    @collect
end
6×2 DataFrame
Rowmodelmodel2
StringString
1Mazda RX4MAZDA.RX4.
2Mazda RX4 WagMAZDA.RX4.WAG.
3Datsun 710DATSUN.710.
4Hornet 4 DriveHORNET.4.DRIVE.
5Hornet SportaboutHORNET.SPORTABOUT.
6ValiantVALIANT.

rowid and pseudocolumns¤

When a table is not being read directly from a file, rowid is avaialable for use. In general, TidierDB should support all pseudocolumns.

copy_to(db, mtcars_path, "mtcars"); # copying table in for demostration purposes
@chain db_table(db, :mtcars) begin
      @filter(rowid == 4)
      @select(model:hp)
      @collect
end
1×5 DataFrame
 Row │ model              mpg       cyl     disp      hp
     │ String?            Float64?  Int64?  Float64?  Int64?
─────┼───────────────────────────────────────────────────────
   1 │ Hornet Sportabout      18.7       8     360.0     175

This page was generated using Literate.jl.