Flexible Syntax and UDFs
TidierDB is unique in its statement parsing flexiblility. This means that in addition to using any built in SQL database functions, user defined functions (or UDFS) are readily avaialable in TidierDB.
using TidierDB # DuckDB is reexported by TidierDB
db = connect(duckdb());
df = DataFrame(a = [1, 2, 3], b = [1, 2, 3]);
dfv = dt(db, df, "df_view");
UDFs in DuckDB¤
Once created, UDFs can immediately be used in with @mutate
or @transmute
A more in depth disccusion of UDFs in DuckDB.jl can be found here. There are 3 steps 1) Define a function in julia, 2) create the scalar function in DuckDB, and 3) register it
bino = (a, b) -> (a + b) * (a + b)
fun = DuckDB.@create_scalar_function bino(a::Int, b::Int)::Int
DuckDB.register_scalar_function(db, fun)
@chain t(dfv) @mutate(c = bino(a, b)) @collect
Row | a | b | c |
---|---|---|---|
Int64 | Int64 | Int64 | |
1 | 1 | 1 | 4 |
2 | 2 | 2 | 16 |
3 | 3 | 3 | 36 |
Notably, when the function is redefined (with the same arguments) in julia, the DuckDB UDF representation will change as well.
bino = (a, b) -> (a + b) * (a - b)
@chain t(dfv) @mutate(c = bino(a, b)) @collect
Row | a | b | c |
---|---|---|---|
Int64 | Int64 | Int64 | |
1 | 1 | 1 | 0 |
2 | 2 | 2 | 0 |
3 | 3 | 3 | 0 |
DuckDB function chaining¤
In DuckDB, functions can be chained together with .
. TidierDB lets you leverage this.
mtcars_path = "https://gist.githubusercontent.com/seankross/a412dfbd88b3db70b74b/raw/5f23f993cd87c283ce766e7ac6b329ee7cc2e1d1/mtcars.csv";
mtcars = dt(db, mtcars_path);
@chain t(mtcars) begin
@mutate(model2 = model.upper().string_split(" ").list_aggr("string_agg",".").concat("."))
@select model model2
@head()
@collect
end
Row | model | model2 |
---|---|---|
String | String | |
1 | Mazda RX4 | MAZDA.RX4. |
2 | Mazda RX4 Wag | MAZDA.RX4.WAG. |
3 | Datsun 710 | DATSUN.710. |
4 | Hornet 4 Drive | HORNET.4.DRIVE. |
5 | Hornet Sportabout | HORNET.SPORTABOUT. |
6 | Valiant | VALIANT. |
rowid
and pseudocolumns¤
When a table is not being read directly from a file, rowid
is avaialable for use. In general, TidierDB should support all pseudocolumns.
copy_to(db, mtcars_path, "mtcars"); # copying table in for demostration purposes
@chain db_table(db, :mtcars) begin
@filter(rowid == 4)
@select(model:hp)
@collect
end
1×5 DataFrame
Row │ model mpg cyl disp hp
│ String? Float64? Int64? Float64? Int64?
─────┼───────────────────────────────────────────────────────
1 │ Hornet Sportabout 18.7 8 360.0 175
This page was generated using Literate.jl.