Joining Tables
TidierDB supports mutliple join types including equi-joins, nonequi-joins, and as of or rolling joins.
General Syntax¤
All joins share the same argument format
*_join(query, join_table, joining_keys...)
Equi Joins¤
Equi joins can be written in any of the following ways, and the key column will be dropped from the right hand (new) table to avoid duplication.
@left_join(t(table), "table2", key_col)
@left_join(t(table), "table2", key_col = key_col2)
To join mutliple columns, separate the different pairs with a ,
@left_join(t(table), "table2", key_col == key_col2, key2 == key2)
Inequality Joins¤
Inequality joins or non-equi-joins use the same syntax, just with a inequality operators
@left_join(t(table), "table2", key_col >= key_col2, key2 < key2)
AsOf¤
To use an AsOf or rolling join, simply wrap the inequality in `closest. Of note, at this time, only one inequality can be supported at a time with AsOf joins
@left_join(t(table), "table2", closest(key_col >= key_col2), key2 == key2)
When the joining table is already availabe on the database, a string of the table name used as shown above. However, the joining table can also be a TidierDB query, in which case, the query is written as follows
@left_join(t(table),query, key)
Examples¤
Examples below will cover how to join tables with different schemas in different databases, and how to write queries on tables and then join them together, and how to do this by levaraging views. Some examples <!–
using TidierDB
db = connect(duckdb())
mtcars = dt(db, "https://gist.githubusercontent.com/seankross/a412dfbd88b3db70b74b/raw/5f23f993cd87c283ce766e7ac6b329ee7cc2e1d1/mtcars.csv")
SQLQuery(true, "", "'https://gist.githubusercontent.com/seankross/a412dfbd88b3db70b74b/raw/5f23f993cd87c283ce766e7ac6b329ee7cc2e1d1/mtcars.csv' AS mtcars ", "", "", "", "", "", "", false, false, false, 12×4 DataFrame
Row │ name type current_selxn table_name
│ String String Int64 SubString…
─────┼────────────────────────────────────────────
1 │ model VARCHAR 1 mtcars
2 │ mpg DOUBLE 1 mtcars
3 │ cyl BIGINT 1 mtcars
4 │ disp DOUBLE 1 mtcars
5 │ hp BIGINT 1 mtcars
6 │ drat DOUBLE 1 mtcars
7 │ wt DOUBLE 1 mtcars
8 │ qsec DOUBLE 1 mtcars
9 │ vs BIGINT 1 mtcars
10 │ am BIGINT 1 mtcars
11 │ gear BIGINT 1 mtcars
12 │ carb BIGINT 1 mtcars, false, DuckDB.DB(":memory:"), TidierDB.CTE[], 0, nothing, "", "", 0, false)
–>
Setup¤
using TidierDB
db = connect(duckdb(), "md:")
mtcars = dt(db, "my_db.mtcars")
mt2 = dt(db, "ducks_db.mt2")
Wrangle tables and self join¤
query = @chain mtcars begin
@group_by cyl
@summarize begin
across(mpg, (mean, minimum, maximum))
num_cars = n()
end
@mutate begin
efficiency = case_when(
mpg_mean >= 25, "High",
mpg_mean >= 15, "Moderate",
"Low" )
end
end;
query2 = @chain mtcars @filter(mpg>20) @mutate(mpg = mpg *4);
@chain query begin
@left_join(t(query2), cyl == cyl)
@summarize(avg_mean = mean(mpg), _by = efficiency)
@mutate(mean = avg_mean / 4 )
@collect
end
Row | efficiency | avg_mean | mean |
---|---|---|---|
String | Float64 | Float64 | |
1 | Moderate | 84.5333 | 21.1333 |
2 | High | 106.655 | 26.6636 |
Different schemas¤
To connect to a table in a different schema, prefix it with a dot. For example, "schemaname.tablename". In this query, we are also filtering out cars that contain "M" in the name from the mt2
table before joining.
other_db = @chain dt(db, "ducks_db.mt2") @filter(!str_detect(car, "M"))
@chain mtcars begin
@left_join(t(other_db), model == car)
@select(model, fuel_efficiency)
@head(5)
@collect
end
5×2 DataFrame
Row │ model fuel_efficiency
│ String Int64
─────┼────────────────────────────────────
1 │ Datsun 710 24
2 │ Hornet 4 Drive 18
3 │ Hornet Sportabout 16
4 │ Valiant 15
5 │ Duster 360 14
To join directly to the table, you can use the @left_join
macro with the table name as a string.
@chain mtcars begin
@left_join("ducks_db.mt2", model == car)
@select(model, fuel_efficiency)
@head(5)
@collect
end
5×2 DataFrame
Row │ model fuel_efficiency
│ String Int64
─────┼────────────────────────────────────
1 │ Datsun 710 24
2 │ Hornet 4 Drive 18
3 │ Hornet Sportabout 16
4 │ Valiant 15
5 │ Duster 360 14
Using a View¤
You can also use @create_view
to create views and then join them. This is an alternate reuse complex queries.
@chain mtcars begin
@group_by cyl
@summarize begin
across(mpg, (mean, minimum, maximum))
num_cars = n()
end
@mutate begin
efficiency = case_when(
mpg_mean >= 25, "High",
mpg_mean >= 15, "Moderate",
"Low" )
end
@create_view(viewer)
end;
@chain dt(db, "viewer") begin # access the view like any other table
@left_join(t(query2), cyl == cyl)
@summarize(avg_mean = mean(mpg), _by = efficiency)
@mutate(mean = avg_mean / 4 )
@collect
end
Row | efficiency | avg_mean | mean |
---|---|---|---|
String | Float64 | Float64 | |
1 | Moderate | 84.5333 | 21.1333 |
2 | High | 106.655 | 26.6636 |
AsOf/Rolling join¤
This example reproduces an example in the DuckDB Docs
prices = dt(db, "https://duckdb.org/data/prices.csv", "prices");
holdings = dt(db, "https://duckdb.org/data/holdings.csv", "holdings");
@chain holdings begin
@inner_join(t(prices), ticker = ticker, closest(when >= when))
@select(holdings.ticker, holdings.when)
@mutate(value = price * shares)
@collect
end
Row | ticker | when | value |
---|---|---|---|
String | DateTime | Float64 | |
1 | APPL | 2001-01-01T00:00:30 | 2.94 |
2 | APPL | 2001-01-01T00:01:30 | 48.26 |
3 | GOOG | 2001-01-01T00:00:30 | 23.45 |
4 | GOOG | 2001-01-01T00:01:30 | 21.16 |
This page was generated using Literate.jl.