Flatten

The flatten family of functions aim to “reduce one level” of a object: if you have a dictionary where some values are also dictionaries, we “peel” this inner dictionary and spread it among the original dictionary. This is specially useful when parsing the output of a rest API and transforming it into a dataframe.

Dictionaries and inner dictionaries

Consider the following nested dictionary describing a vehicle:

using TidierIteration

d1 = Dict(
    "id" => 1
    ,"model" => "Kadettão"
    ,"year" => 1998
    ,"details" => Dict(
        "plate_number" => "XXX1234"
        ,"chassi" => 999
        ,"location" => Dict(
            "country" => "Brasil"
            ,"state" => "São Paulo"
            )
        )
    )
Dict{String, Any} with 4 entries:
  "details" => Dict{String, Any}("plate_number"=>"XXX1234", "location"=>Dict("c…
  "model"   => "Kadettão"
  "year"    => 1998
  "id"      => 1

We can flat the inner dictionaries as follows:

d1
flatten(d1, n = 1)
Dict{String, Any} with 6 entries:
  "model"                => "Kadettão"
  "year"                 => 1998
  "id"                   => 1
  "details_chassi"       => 999
  "details_plate_number" => "XXX1234"
  "details_location"     => Dict("country"=>"Brasil", "state"=>"São Paulo")

We can apply the flatten n consecutive times adding n to the end of the function call:

flatten(d1, n = 2)
Dict{String, Any} with 7 entries:
  "details_location_country" => "Brasil"
  "details_location_state"   => "São Paulo"
  "model"                    => "Kadettão"
  "year"                     => 1998
  "details_chassi"           => 999
  "details_plate_number"     => "XXX1234"
  "id"                       => 1

Converting it to a dataframe is simple:

flatten(d1, n = 1) |> DataFrame
1×6 DataFrame
Row details_chassi details_location details_plate_number id model year
Int64 Dict… String Int64 String Int64
1 999 Dict("country"=>"Brasil", "state"=>"São Paulo") XXX1234 1 Kadettão 1998

In case of a vector of nested dictionaries, there is the flatten_dfr:

d2 = Dict(
    "id" => 2
    ,"model" => "Monzão"
    ,"year" => 1995
    ,"details" => Dict(
        "plate_number" => "ZZZ1234"
        ,"chassi" => 1234
        ,"location" => Dict(
            "country" => "Brasil"
            ,"state" => "Amazonas"
            )
        )
        ,"stolen" => true
    )

ds = [d1, d2]

flatten_dfr(ds, n = 2)
2×8 DataFrame
Row details_chassi details_location_country details_location_state details_plate_number id model year stolen
Int64 String String String Int64 String Int64 Bool?
1 999 Brasil São Paulo XXX1234 1 Kadettão 1998 missing
2 1234 Brasil Amazonas ZZZ1234 2 Monzão 1995 true

If you want to convert the inner dictionaries/arrays to json (useful when saving to a relational database), use the function

flatten_dfr_json(ds, n = 1)
2×7 DataFrame
Row details_chassi details_location details_plate_number id model year stolen
Int64 String String Int64 String Int64 Bool?
1 999 {"country":"Brasil","state":"São Paulo"} XXX1234 1 Kadettão 1998 missing
2 1234 {"country":"Brasil","state":"Amazonas"} ZZZ1234 2 Monzão 1995 true