Flexible Syntax and UDFs
TidierDB is unique in its statement parsing flexiblility. This means that in addition to using any built in SQL database functions, user defined functions (or UDFS) are readily avaialable in TidierDB.
using TidierDB # DuckDB is reexported by TidierDB
db = connect(duckdb());
df = DataFrame(a = [1, 2, 3], b = [1, 2, 3]);
dfv = dt(db, df, "df_view");
UDFs in DuckDB¤
Once created, UDFs can immediately be used in with @mutate or @transmute A more in depth disccusion of UDFs in DuckDB.jl can be found here. There are 3 steps 1) Define a function in julia, 2) create the scalar function in DuckDB, and 3) register it
bino = (a, b) -> (a + b) * (a + b)
fun = DuckDB.@create_scalar_function bino(a::Int, b::Int)::Int
DuckDB.register_scalar_function(db, fun)
@chain dfv @mutate(c = bino(a, b)) @collect
| Row | a | b | c |
|---|---|---|---|
| Int64 | Int64 | Int64 | |
| 1 | 1 | 1 | 4 |
| 2 | 2 | 2 | 16 |
| 3 | 3 | 3 | 36 |
Notably, when the function is redefined (with the same arguments) in julia, the DuckDB UDF representation will change as well.
bino = (a, b) -> (a + b) * (a - b)
@chain dfv @mutate(c = bino(a, b)) @collect
| Row | a | b | c |
|---|---|---|---|
| Int64 | Int64 | Int64 | |
| 1 | 1 | 1 | 0 |
| 2 | 2 | 2 | 0 |
| 3 | 3 | 3 | 0 |
DuckDB function chaining¤
In DuckDB, functions can be chained together with .. TidierDB lets you leverage this.
mtcars_path = "https://gist.githubusercontent.com/seankross/a412dfbd88b3db70b74b/raw/5f23f993cd87c283ce766e7ac6b329ee7cc2e1d1/mtcars.csv";
mtcars = dt(db, mtcars_path);
@chain mtcars begin
@mutate(model2 = model.upper().string_split(" ").list_aggr("string_agg",".").concat("."))
@select model model2
@head()
@collect
end
| Row | model | model2 |
|---|---|---|
| String | String | |
| 1 | Mazda RX4 | MAZDA.RX4. |
| 2 | Mazda RX4 Wag | MAZDA.RX4.WAG. |
| 3 | Datsun 710 | DATSUN.710. |
| 4 | Hornet 4 Drive | HORNET.4.DRIVE. |
| 5 | Hornet Sportabout | HORNET.SPORTABOUT. |
| 6 | Valiant | VALIANT. |
rowid and pseudocolumns¤
When a table is not being read directly from a file, rowid is avaialable for use. In general, TidierDB should support all pseudocolumns.
copy_to(db, mtcars_path, "mtcars"); # copying table in for demostration purposes
@chain dt(db, "mtcars") begin
@filter(rowid == 4)
@select(model:hp)
@collect
end
| Row | model | mpg | cyl | disp | hp |
|---|---|---|---|---|---|
| String | Float64 | Int64 | Float64 | Int64 | |
| 1 | Hornet Sportabout | 18.7 | 8 | 360.0 | 175 |
This page was generated using Literate.jl.