Skip to content

Aggregate and Window Functions

TidierDB supports all aggregate functions accross the supported databases, as well as window functions.

Aggregate Functions¤

@summarize, by default, supports all aggregate functions built into a SQL database, with the exception that any ' that would be used in SQL should be replaced wiht ".

using TidierDB
db = connect(duckdb());
mtcars_path = "https://gist.githubusercontent.com/seankross/a412dfbd88b3db70b74b/raw/5f23f993cd87c283ce766e7ac6b329ee7cc2e1d1/mtcars.csv";
mtcars = dt(db, mtcars_path);

Aggregate Functions in @summarize¤

Lets use the DuckDB kurtosis aggregate function

@chain mtcars begin
     @group_by cyl
     @summarize(kurt = kurtosis(mpg))
     @collect
end
3×2 DataFrame
Rowcylkurt
Int64Float64
16-1.82944
24-1.43411
380.330061

Aggregate Functions in @mutate¤

By default, @mutate/@transmute supports (however, you can easily expand this list)

  • maximum, minimum, mean, std, sum, cumsum

To use aggregate sql functions that are built in to any database backend, but exist outside of the TidierDB parser list above, simply wrap the function call in agg()

@chain mtcars begin
     @group_by(cyl)
     @mutate(kurt = agg(kurtosis(mpg)))
     @select cyl mpg kurt
     @head()
     @collect
end
6×3 DataFrame
Rowcylmpgkurt
Int64Float64Float64
1621.0-1.82944
2621.0-1.82944
3621.4-1.82944
4618.1-1.82944
5619.2-1.82944
6617.8-1.82944

Alternatively , if you anticipate regularly using specific aggregate functions, you can update the underlying parser avoid using agg all together

push!(TidierDB.window_agg_fxns, :kurtosis);
@chain mtcars begin
     @group_by(cyl)
     @mutate(kurt = kurtosis(mpg))
     @select cyl mpg kurt
     @head()
     @collect
end
6×3 DataFrame
Rowcylmpgkurt
Int64Float64Float64
1621.0-1.82944
2621.0-1.82944
3621.4-1.82944
4618.1-1.82944
5619.2-1.82944
6617.8-1.82944

Window Functions¤

TidierDB's @mutate/@transmute support all of the window functions below

  • lead, lag, dense_rank, nth_value, ntile, rank_dense, row_number, first_value, last_value, cume_dist

When ordering a window function, @arrange should not be used. Rather, use @window_order or, preferably, _order and _frame in @mutate.

@chain mtcars begin
    @mutate(row_id = row_number(),
        _by = cyl,
        _order = mpg # _frame is not used in this example
        )
    @head()
    @collect
end
6×13 DataFrame
Rowmodelmpgcyldisphpdratwtqsecvsamgearcarbrow_id
StringFloat64Int64Float64Int64Float64Float64Float64Int64Int64Int64Int64Int64
1Merc 280C17.86167.61233.923.4418.910441
2Valiant18.16225.01052.763.4620.2210312
3Merc 28019.26167.61233.923.4418.310443
4Ferrari Dino19.76145.01753.622.7715.501564
5Mazda RX421.06160.01103.92.6216.4601445
6Mazda RX4 Wag21.06160.01103.92.87517.0201446

The above query could have alternatively been written as

@chain mtcars begin
    @group_by cyl
    @window_order mpg
    @mutate(row_id = row_number())
    @head()
    @collect
end
6×13 DataFrame
Rowmodelmpgcyldisphpdratwtqsecvsamgearcarbrow_id
StringFloat64Int64Float64Int64Float64Float64Float64Int64Int64Int64Int64Int64
1Merc 280C17.86167.61233.923.4418.910441
2Valiant18.16225.01052.763.4620.2210312
3Merc 28019.26167.61233.923.4418.310443
4Ferrari Dino19.76145.01753.622.7715.501564
5Mazda RX421.06160.01103.92.6216.4601445
6Mazda RX4 Wag21.06160.01103.92.87517.0201446

This page was generated using Literate.jl.