Flatten

The flatten family of functions aim to “reduce one level” of a object: if you have a dictionary where some values are also dictionaries, we “peel” this inner dictionary and spread it among the original dictionary. This is specially useful when parsing the output of a rest API and transforming it into a dataframe.

Dictionaries and inner dictionaries

Consider the following nested dictionary describing a vehicle:

using TidierIteration

d1 = Dict(
    "id" => 1
    ,"model" => "Kadettão"
    ,"year" => 1998
    ,"details" => Dict(
        "plate_number" => "XXX1234"
        ,"chassi" => 999
        ,"location" => Dict(
            "country" => "Brasil"
            ,"state" => "São Paulo"
            )
        )
    )

Dict{String, Any} with 4 entries:
  "details" => Dict{String, Any}("plate_number"=>"XXX1234", "location"=>Dict("c…
  "model"   => "Kadettão"
  "year"    => 1998
  "id"      => 1

We can flat the inner dictionaries as follows:

d1
flatten(d1, n = 1)

Dict{String, Any} with 6 entries:
  "model"                => "Kadettão"
  "year"                 => 1998
  "id"                   => 1
  "details_chassi"       => 999
  "details_plate_number" => "XXX1234"
  "details_location"     => Dict("country"=>"Brasil", "state"=>"São Paulo")

We can apply the flatten n consecutive times adding n to the end of the function call:

flatten(d1, n = 2)

Dict{String, Any} with 7 entries:
  "details_location_country" => "Brasil"
  "details_location_state"   => "São Paulo"
  "model"                    => "Kadettão"
  "year"                     => 1998
  "details_chassi"           => 999
  "details_plate_number"     => "XXX1234"
  "id"                       => 1

Converting it to a dataframe is simple:

flatten(d1, n = 1) |> DataFrame

1×6 DataFrame

Row	details_chassi	details_location	details_plate_number	id	model	year
	Int64	Dict…	String	Int64	String	Int64
1	999	Dict("country"=>"Brasil", "state"=>"São Paulo")	XXX1234	1	Kadettão	1998

In case of a vector of nested dictionaries, there is the flatten_dfr:

d2 = Dict(
    "id" => 2
    ,"model" => "Monzão"
    ,"year" => 1995
    ,"details" => Dict(
        "plate_number" => "ZZZ1234"
        ,"chassi" => 1234
        ,"location" => Dict(
            "country" => "Brasil"
            ,"state" => "Amazonas"
            )
        )
        ,"stolen" => true
    )

ds = [d1, d2]

flatten_dfr(ds, n = 2)

2×8 DataFrame

Row	details_chassi	details_location_country	details_location_state	details_plate_number	id	model	year	stolen
	Int64	String	String	String	Int64	String	Int64	Bool?
1	999	Brasil	São Paulo	XXX1234	1	Kadettão	1998	missing
2	1234	Brasil	Amazonas	ZZZ1234	2	Monzão	1995	true

If you want to convert the inner dictionaries/arrays to json (useful when saving to a relational database), use the function

flatten_dfr_json(ds, n = 1)

2×7 DataFrame

Row	details_chassi	details_location	details_plate_number	id	model	year	stolen
	Int64	String	String	Int64	String	Int64	Bool?
1	999	{"country":"Brasil","state":"São Paulo"}	XXX1234	1	Kadettão	1998	missing
2	1234	{"country":"Brasil","state":"Amazonas"}	ZZZ1234	2	Monzão	1995	true