Skip to content

Flexible Syntax and UDFs

TidierDB is unique in its statement parsing flexiblility. This means that using any built in SQL function or user defined functions (or UDFS) or is readily avaialable. To use any function built into a database in @mutate or in @summarize, simply correctly write the correctly, but replace ' with ". This also applies to any UDF. The example below will illustrate UDFs in the context of DuckDB.

# Set up the connection
using TidierDB  #rexports DuckDB
db = DuckDB.DB()
con = DuckDB.connect(db) # this will be important for UDFs
mtcars_path = "https://gist.githubusercontent.com/seankross/a412dfbd88b3db70b74b/raw/5f23f993cd87c283ce766e7ac6b329ee7cc2e1d1/mtcars.csv"
mtcars = db_tbable(db, mtcars_path);

aggregate function in @summarize¤

Lets use the DuckDB kurtosis aggregate function

@chain t(mtcars) begin
      @group_by cyl
      @summarize(kurt = kurtosis(mpg))
      @collect
end
3×2 DataFrame
 Row │ cyl     kurt
     │ Int64?  Float64?
─────┼───────────────────
   1 │      4  -1.43411
   2 │      6  -1.82944
   3 │      8   0.330061

aggregate functions in @mutate¤

To aggregate sql functions that are builtin to any database, but exist outside of the TidierDB parser, simply wrap the function call in agg()

@chain t(mtcars) begin
    @group_by(cyl)
    @mutate(kurt = agg(kurtosis(mpg)))
    @select cyl mpg kurt
    @collect
end

32×3 DataFrame
 Row │ cyl     mpg       kurt
     │ Int64?  Float64?  Float64?
─────┼─────────────────────────────
   1 │      8      18.7   0.330061
   2 │      8      14.3   0.330061
   3 │      8      16.4   0.330061
   4 │      8      17.3   0.330061
   5 │      8      15.2   0.330061
   6 │      8      10.4   0.330061
   7 │      8      10.4   0.330061
  ⋮  │   ⋮        ⋮          ⋮
  27 │      6      21.0  -1.82944
  28 │      6      21.4  -1.82944
  29 │      6      18.1  -1.82944
  30 │      6      19.2  -1.82944
  31 │      6      17.8  -1.82944
  32 │      6      19.7  -1.82944
                    19 rows omitted
end

DuckDB function chaining¤

In DuckDB, functions can be chained together with .. TidierDB lets you leverage this.

@chain t(mtcars) begin
    @mutate(model2 = model.upper().string_split(" ").list_aggr("string_agg",".").concat("."))
    @select model model2
    @collect
end
32×2 DataFrame
 Row │ model              model2
     │ String?            String?
─────┼───────────────────────────────────────
   1 │ Mazda RX4          MAZDA.RX4.
   2 │ Mazda RX4 Wag      MAZDA.RX4.WAG.
   3 │ Datsun 710         DATSUN.710.
   4 │ Hornet 4 Drive     HORNET.4.DRIVE.
   5 │ Hornet Sportabout  HORNET.SPORTABOUT.
   6 │ Valiant            VALIANT.
   7 │ Duster 360         DUSTER.360.
  ⋮  │         ⋮                  ⋮
  27 │ Porsche 914-2      PORSCHE.914-2.
  28 │ Lotus Europa       LOTUS.EUROPA.
  29 │ Ford Pantera L     FORD.PANTERA.L.
  30 │ Ferrari Dino       FERRARI.DINO.
  31 │ Maserati Bora      MASERATI.BORA.
  32 │ Volvo 142E         VOLVO.142E.
                              19 rows omitted

rowid and pseudocolumns¤

When a table is not being read directly from a file, rowid is avaialable for use. In general, TidierDB should support all pseudocolumns.

copy_to(db, mtcars_path, "mtcars"); # copying table in for demostration purposes
@chain db_table(db, :mtcars) begin
      @filter(rowid == 4)
      @select(model:hp)
      @collect
end
1×5 DataFrame
 Row │ model              mpg       cyl     disp      hp
     │ String?            Float64?  Int64?  Float64?  Int64?
─────┼───────────────────────────────────────────────────────
   1 │ Hornet Sportabout      18.7       8     360.0     175

UDF SQLite Example¤

using SQLite
sql = connect(sqlite());
df = DataFrame(id = [string('A' + i ÷ 26, 'A' + i % 26) for i in 0:9],
                        groups = [i % 2 == 0 ? "aa" : "bb" for i in 1:10],
                        value = repeat(1:5, 2),
                        percent = 0.1:0.1:1.0);

copy_to(db, sql, "df_mem");
SQLite.@register sql function diff_of_squares(x, y)
              x^2 - y^2
              end;

@chain db_table(sql, "df_mem") begin
      @select(value, percent)
      @mutate(plus3 = diff_of_squares(value, percent))
      @collect
end
10×3 DataFrame
 Row │ value  percent  plus3
     │ Int64  Float64  Float64
─────┼─────────────────────────
   1 │     1      0.1     0.99
   2 │     2      0.2     3.96
   3 │     3      0.3     8.91
   4 │     4      0.4    15.84
   5 │     5      0.5    24.75
   6 │     1      0.6     0.64
   7 │     2      0.7     3.51
   8 │     3      0.8     8.36
   9 │     4      0.9    15.19
  10 │     5      1.0    24.0

How to create UDF in DuckDB¤

Example coming soon..


This page was generated using Literate.jl.