S3 + DuckDB + TidierDB

TidierDB allows you leverage DuckDB's seamless database integration.

Using DuckDB, you can connect to an AWS or GoogleCloud Database to query directly without making any local copies.

You can also use DBInterface.execute to set up any DuckDB database connection you need and then use that db to query with TidierDB

using TidierDB

#Connect to Google Cloud via DuckDB
#google_db = connect(duckdb(), :gbq, access_key="string", secret_key="string")

#Connect to AWS via DuckDB
aws_db = connect(duckdb(), :aws, aws_access_key_id= "string",
                                aws_secret_access_key= "string",
                                aws_region="us-east-1")
s3_csv_path = "s3://path/to_data.csv"

@chain db_table(aws_db, s3_csv_path) begin
    @filter(!starts_with(column1, "M"))
    @group_by(cyl)
    @summarize(mpg = mean(mpg))
    @mutate(mpg_squared = mpg^2,
               mpg_rounded = round(mpg),
               mpg_efficiency = case_when(
                                 mpg >= cyl^2 , "efficient",
                                 mpg < 15.2 , "inefficient",
                                 "moderate"))
    @filter(mpg_efficiency in ("moderate", "efficient"))
    @arrange(desc(mpg_rounded))
    @collect
end
2×5 DataFrame
 Row │ cyl     mpg       mpg_squared  mpg_rounded  mpg_efficiency
     │ Int64?  Float64?  Float64?     Float64?     String?
─────┼────────────────────────────────────────────────────────────
   1 │      4   27.3444      747.719         27.0  efficient
   2 │      6   19.7333      389.404         20.0  moderate

This page was generated using Literate.jl.