Skip to content

Aggregate and Window Functions

TidierDB supports all aggregate functions accross the supported databases, as well as window functions.

Aggregate Functions¤

@summarize, by default, supports all aggregate functions built into a SQL database, with the exception that any ' that would be used in SQL should be replaced wiht ".

using TidierDB
db = connect(duckdb());
mtcars_path = "https://gist.githubusercontent.com/seankross/a412dfbd88b3db70b74b/raw/5f23f993cd87c283ce766e7ac6b329ee7cc2e1d1/mtcars.csv";
mtcars = db_table(db, mtcars_path);

Aggregate Functions in @summarize¤

Lets use the DuckDB kurtosis aggregate function

@chain t(mtcars) begin
     @group_by cyl
     @summarize(kurt = kurtosis(mpg))
     @collect
end
3×2 DataFrame
Rowcylkurt
Int64Float64
16-1.82944
24-1.43411
380.330061

Aggregate Functions in @mutate¤

By default, @mutate/@transmute supports (however, you can easily expand this list)

  • maximum, minimum, mean, std, sum, cumsum

To use aggregate sql functions that are built in to any database backend, but exist outside of the TidierDB parser list above, simply wrap the function call in agg()

@chain t(mtcars) begin
     @group_by(cyl)
     @mutate(kurt = agg(kurtosis(mpg)))
     @select cyl mpg kurt
     @head()
     @collect
end
6×3 DataFrame
Rowcylmpgkurt
Int64Float64Float64
1621.0-1.82944
2621.0-1.82944
3621.4-1.82944
4618.1-1.82944
5619.2-1.82944
6617.8-1.82944

Alternatively , if you anticipate regularly using specific aggregate functions, you can update the underlying parser avoid using agg all together

push!(TidierDB.window_agg_fxns, :kurtosis);
@chain t(mtcars) begin
     @group_by(cyl)
     @mutate(kurt = kurtosis(mpg))
     @select cyl mpg kurt
     @head()
     @collect
end
6×3 DataFrame
Rowcylmpgkurt
Int64Float64Float64
1621.0-1.82944
2621.0-1.82944
3621.4-1.82944
4618.1-1.82944
5619.2-1.82944
6617.8-1.82944

Window Functions¤

TidierDB's @mutate/@transmute support all of the window functions below

  • lead, lag, dense_rank, nth_value, ntile, rank_dense, row_number, first_value, last_value, cume_dist

When ordering a window function, @arrange should not be used. Rather, use @window_order or, preferably, _order and _frame in @mutate.

@chain t(mtcars) begin
    @mutate(row_id = row_number(),
        _by = cyl,
        _order = mpg # _frame is not used in this example
        )
    @collect
end
32×13 DataFrame
Rowmodelmpgcyldisphpdratwtqsecvsamgearcarbrow_id
StringFloat64Int64Float64Int64Float64Float64Float64Int64Int64Int64Int64Int64
1Merc 280C17.86167.61233.923.4418.910441
2Valiant18.16225.01052.763.4620.2210312
3Merc 28019.26167.61233.923.4418.310443
4Ferrari Dino19.76145.01753.622.7715.501564
5Mazda RX421.06160.01103.92.6216.4601445
6Mazda RX4 Wag21.06160.01103.92.87517.0201446
7Hornet 4 Drive21.46258.01103.083.21519.4410317
8Volvo 142E21.44121.01094.112.7818.611421
9Toyota Corona21.54120.1973.72.46520.0110312
10Datsun 71022.84108.0933.852.3218.6111413
11Merc 23022.84140.8953.923.1522.910424
12Merc 240D24.44146.7623.693.1920.010425
13Porsche 914-226.04120.3914.432.1416.701526
14Fiat X1-927.3479.0664.081.93518.911417
15Honda Civic30.4475.7524.931.61518.5211428
16Lotus Europa30.4495.11133.771.51316.911529
17Fiat 12832.4478.7664.082.219.47114110
18Toyota Corolla33.9471.1654.221.83519.9114111
19Cadillac Fleetwood10.48472.02052.935.2517.9800341
20Lincoln Continental10.48460.02153.05.42417.8200342
21Camaro Z2813.38350.02453.733.8415.4100343
22Duster 36014.38360.02453.213.5715.8400344
23Chrysler Imperial14.78440.02303.235.34517.4200345
24Maserati Bora15.08301.03353.543.5714.601586
25Merc 450SLC15.28275.81803.073.7818.000337
26AMC Javelin15.28304.01503.153.43517.300328
27Dodge Challenger15.58318.01502.763.5216.8700329
28Ford Pantera L15.88351.02644.223.1714.5015410
29Merc 450SE16.48275.81803.074.0717.4003311
30Merc 450SL17.38275.81803.073.7317.6003312
31Hornet Sportabout18.78360.01753.153.4417.02003213
32Pontiac Firebird19.28400.01753.083.84517.05003214

The above query could have alternatively been written as

@chain t(mtcars) begin
    @group_by cyl
    @window_order mpg
    @mutate(row_id = row_number())
    @collect
end
32×13 DataFrame
Rowmodelmpgcyldisphpdratwtqsecvsamgearcarbrow_id
StringFloat64Int64Float64Int64Float64Float64Float64Int64Int64Int64Int64Int64
1Merc 280C17.86167.61233.923.4418.910441
2Valiant18.16225.01052.763.4620.2210312
3Merc 28019.26167.61233.923.4418.310443
4Ferrari Dino19.76145.01753.622.7715.501564
5Mazda RX421.06160.01103.92.6216.4601445
6Mazda RX4 Wag21.06160.01103.92.87517.0201446
7Hornet 4 Drive21.46258.01103.083.21519.4410317
8Volvo 142E21.44121.01094.112.7818.611421
9Toyota Corona21.54120.1973.72.46520.0110312
10Datsun 71022.84108.0933.852.3218.6111413
11Merc 23022.84140.8953.923.1522.910424
12Merc 240D24.44146.7623.693.1920.010425
13Porsche 914-226.04120.3914.432.1416.701526
14Fiat X1-927.3479.0664.081.93518.911417
15Honda Civic30.4475.7524.931.61518.5211428
16Lotus Europa30.4495.11133.771.51316.911529
17Fiat 12832.4478.7664.082.219.47114110
18Toyota Corolla33.9471.1654.221.83519.9114111
19Cadillac Fleetwood10.48472.02052.935.2517.9800341
20Lincoln Continental10.48460.02153.05.42417.8200342
21Camaro Z2813.38350.02453.733.8415.4100343
22Duster 36014.38360.02453.213.5715.8400344
23Chrysler Imperial14.78440.02303.235.34517.4200345
24Maserati Bora15.08301.03353.543.5714.601586
25Merc 450SLC15.28275.81803.073.7818.000337
26AMC Javelin15.28304.01503.153.43517.300328
27Dodge Challenger15.58318.01502.763.5216.8700329
28Ford Pantera L15.88351.02644.223.1714.5015410
29Merc 450SE16.48275.81803.074.0717.4003311
30Merc 450SL17.38275.81803.073.7317.6003312
31Hornet Sportabout18.78360.01753.153.4417.02003213
32Pontiac Firebird19.28400.01753.083.84517.05003214

This page was generated using Literate.jl.