Skip to content

Joining Tables

TidierDB supports mutliple join types including equi-joins, nonequi-joins, and as of or rolling joins.

General Syntax¤

All joins share the same argument format

  • *_join(query, join_table, joining_keys...)

Equi Joins¤

Equi joins can be written in any of the following ways, and the key column will be dropped from the right hand (new) table to avoid duplication.

  • @left_join(t(table), "table2", key_col)
  • @left_join(t(table), "table2", key_col = key_col2)

To join mutliple columns, separate the different pairs with a ,

  • @left_join(t(table), "table2", key_col == key_col2, key2 == key2)

Inequality Joins¤

Inequality joins or non-equi-joins use the same syntax, just with a inequality operators

  • @left_join(t(table), "table2", key_col >= key_col2, key2 < key2)

AsOf¤

To use an AsOf or rolling join, simply wrap the inequality in `closest. Of note, at this time, only one inequality can be supported at a time with AsOf joins

  • @left_join(t(table), "table2", closest(key_col >= key_col2), key2 == key2)

When the joining table is already availabe on the database, a string of the table name used as shown above. However, the joining table can also be a TidierDB query, in which case, the query is written as follows

  • @left_join(t(table),query, key)

Examples¤

Examples below will cover how to join tables with different schemas in different databases, and how to write queries on tables and then join them together, and how to do this by levaraging views. Some examples <!–

using TidierDB
db = connect(duckdb())
mtcars = dt(db, "https://gist.githubusercontent.com/seankross/a412dfbd88b3db70b74b/raw/5f23f993cd87c283ce766e7ac6b329ee7cc2e1d1/mtcars.csv")
SQLQuery(true, "", "'https://gist.githubusercontent.com/seankross/a412dfbd88b3db70b74b/raw/5f23f993cd87c283ce766e7ac6b329ee7cc2e1d1/mtcars.csv' AS mtcars ", "", "", "", "", "", "", false, false, false, 12×4 DataFrame
 Row │ name    type     current_selxn  table_name
     │ String  String   Int64          SubString…
─────┼────────────────────────────────────────────
   1 │ model   VARCHAR              1  mtcars
   2 │ mpg     DOUBLE               1  mtcars
   3 │ cyl     BIGINT               1  mtcars
   4 │ disp    DOUBLE               1  mtcars
   5 │ hp      BIGINT               1  mtcars
   6 │ drat    DOUBLE               1  mtcars
   7 │ wt      DOUBLE               1  mtcars
   8 │ qsec    DOUBLE               1  mtcars
   9 │ vs      BIGINT               1  mtcars
  10 │ am      BIGINT               1  mtcars
  11 │ gear    BIGINT               1  mtcars
  12 │ carb    BIGINT               1  mtcars, false, DuckDB.DB(":memory:"), TidierDB.CTE[], 0, nothing, "", "", 0, false)

–>

Setup¤

using TidierDB
db = connect(duckdb(), "md:")

mtcars = dt(db, "my_db.mtcars")
mt2 = dt(db, "ducks_db.mt2")

Wrangle tables and self join¤

query = @chain mtcars begin
    @group_by cyl
    @summarize begin
        across(mpg, (mean, minimum, maximum))
        num_cars = n()
        end
    @mutate begin
        efficiency = case_when(
            mpg_mean >= 25, "High",
            mpg_mean >= 15, "Moderate",
            "Low" )
      end
end;

query2 = @chain mtcars @filter(mpg>20) @mutate(mpg = mpg *4);

@chain query begin
    @left_join(t(query2), cyl == cyl)
    @summarize(avg_mean = mean(mpg), _by = efficiency)
    @mutate(mean = avg_mean / 4 )
    @collect
end
2×3 DataFrame
Rowefficiencyavg_meanmean
StringFloat64Float64
1Moderate84.533321.1333
2High106.65526.6636

Different schemas¤

To connect to a table in a different schema, prefix it with a dot. For example, "schemaname.tablename". In this query, we are also filtering out cars that contain "M" in the name from the mt2 table before joining.

other_db = @chain dt(db, "ducks_db.mt2") @filter(!str_detect(car, "M"))
@chain mtcars begin
    @left_join(t(other_db), model == car)
    @select(model, fuel_efficiency)
    @head(5)
    @collect
end
5×2 DataFrame
 Row │ model              fuel_efficiency
     │ String             Int64
─────┼────────────────────────────────────
   1 │ Datsun 710                      24
   2 │ Hornet 4 Drive                  18
   3 │ Hornet Sportabout               16
   4 │ Valiant                         15
   5 │ Duster 360                      14

To join directly to the table, you can use the @left_join macro with the table name as a string.

@chain mtcars begin
    @left_join("ducks_db.mt2", model == car)
    @select(model, fuel_efficiency)
    @head(5)
    @collect
end
5×2 DataFrame
 Row │ model              fuel_efficiency
     │ String             Int64
─────┼────────────────────────────────────
   1 │ Datsun 710                      24
   2 │ Hornet 4 Drive                  18
   3 │ Hornet Sportabout               16
   4 │ Valiant                         15
   5 │ Duster 360                      14

Using a View¤

You can also use @create_view to create views and then join them. This is an alternate reuse complex queries.

@chain mtcars begin
    @group_by cyl
    @summarize begin
        across(mpg, (mean, minimum, maximum))
        num_cars = n()
        end
    @mutate begin
        efficiency = case_when(
            mpg_mean >= 25, "High",
            mpg_mean >= 15, "Moderate",
            "Low" )
      end
    @create_view(viewer)
end;

@chain dt(db, "viewer") begin # access the view like any other table
    @left_join(t(query2), cyl == cyl)
    @summarize(avg_mean = mean(mpg), _by = efficiency)
    @mutate(mean = avg_mean / 4 )
    @collect
end
2×3 DataFrame
Rowefficiencyavg_meanmean
StringFloat64Float64
1Moderate84.533321.1333
2High106.65526.6636

AsOf/Rolling join¤

This example reproduces an example in the DuckDB Docs

prices = dt(db, "https://duckdb.org/data/prices.csv", "prices");
holdings = dt(db, "https://duckdb.org/data/holdings.csv", "holdings");
@chain holdings begin
    @inner_join(t(prices), ticker = ticker, closest(when >= when))
    @select(holdings.ticker, holdings.when)
    @mutate(value = price * shares)
    @collect
 end
4×3 DataFrame
Rowtickerwhenvalue
StringDateTimeFloat64
1APPL2001-01-01T00:00:302.94
2APPL2001-01-01T00:01:3048.26
3GOOG2001-01-01T00:00:3023.45
4GOOG2001-01-01T00:01:3021.16

This page was generated using Literate.jl.