Flatten
The flatten family of functions aim to “reduce one level” of a object: if you have a dictionary where some values are also dictionaries, we “peel” this inner dictionary and spread it among the original dictionary. This is specially useful when parsing the output of a rest API and transforming it into a dataframe.
Dictionaries and inner dictionaries
Consider the following nested dictionary describing a vehicle:
using TidierIteration
d1 = Dict (
"id" => 1
,"model" => "Kadettão"
,"year" => 1998
,"details" => Dict (
"plate_number" => "XXX1234"
,"chassi" => 999
,"location" => Dict (
"country" => "Brasil"
,"state" => "São Paulo"
)
)
)
Dict{String, Any} with 4 entries:
"details" => Dict{String, Any}("plate_number"=>"XXX1234", "location"=>Dict("c…
"model" => "Kadettão"
"year" => 1998
"id" => 1
We can flat the inner dictionaries as follows:
Dict{String, Any} with 6 entries:
"model" => "Kadettão"
"year" => 1998
"id" => 1
"details_chassi" => 999
"details_plate_number" => "XXX1234"
"details_location" => Dict("country"=>"Brasil", "state"=>"São Paulo")
We can apply the flatten
n
consecutive times adding n
to the end of the function call:
Dict{String, Any} with 7 entries:
"details_location_country" => "Brasil"
"details_location_state" => "São Paulo"
"model" => "Kadettão"
"year" => 1998
"details_chassi" => 999
"details_plate_number" => "XXX1234"
"id" => 1
Converting it to a dataframe is simple:
flatten (d1, n = 1 ) |> DataFrame
1
999
Dict("country"=>"Brasil", "state"=>"São Paulo")
XXX1234
1
Kadettão
1998
In case of a vector of nested dictionaries, there is the flatten_dfr
:
d2 = Dict (
"id" => 2
,"model" => "Monzão"
,"year" => 1995
,"details" => Dict (
"plate_number" => "ZZZ1234"
,"chassi" => 1234
,"location" => Dict (
"country" => "Brasil"
,"state" => "Amazonas"
)
)
,"stolen" => true
)
ds = [d1, d2]
flatten_dfr (ds, n = 2 )
1
999
Brasil
São Paulo
XXX1234
1
Kadettão
1998
missing
2
1234
Brasil
Amazonas
ZZZ1234
2
Monzão
1995
true
If you want to convert the inner dictionaries/arrays to json (useful when saving to a relational database), use the function
flatten_dfr_json (ds, n = 1 )
1
999
{"country":"Brasil","state":"São Paulo"}
XXX1234
1
Kadettão
1998
missing
2
1234
{"country":"Brasil","state":"Amazonas"}
ZZZ1234
2
Monzão
1995
true