Skip to content

@summary

The @summary() macro in TidierData.jl provides a concise way to compute summary statistics on data. Similar to its R counterpart, it will provide the mean, median, Q1, Q3, minimum, maximum, and number of missing values in a numerical column or columns.

Summary for the whole dataframe¤

using TidierData

df = DataFrame( A = [1, 2, 3, 4, 5], B = [missing, 7, 8, 9, 10], C = [11, missing, 13, 14, missing], D = [16, 17, 18, 19, 20]);

@chain df begin
    @summary()
end

@summary(df)
4×11 DataFrame
Rowcolumnminq1medianmeanq3maxnon_missing_valuesmissing_valuestotal_valuesunique_values
StringInt64Float64Float64Float64Float64Int64Int64Int64Int64Int64
1A12.03.03.04.055055
2B77.758.58.59.25104154
3C1112.013.012.666713.5143253
4D1617.018.018.019.0205055

You can specify columns for which you want to compute the summary. This is useful if the DataFrame has a large number of columns and you're interested in only a subset of them.¤

@chain df begin
    @summary(B)
end

@summary(df, B)
1×11 DataFrame
Rowcolumnminq1medianmeanq3maxnon_missing_valuesmissing_valuestotal_valuesunique_values
StringInt64Float64Float64Float64Float64Int64Int64Int64Int64Int64
1B77.758.58.59.25104154

or for a range of columns¤

@chain df begin
    @select(B:D)
    @summary() # you can also write this @summary(2:4)
end
3×11 DataFrame
Rowcolumnminq1medianmeanq3maxnon_missing_valuesmissing_valuestotal_valuesunique_values
StringInt64Float64Float64Float64Float64Int64Int64Int64Int64Int64
1B77.758.58.59.25104154
2C1112.013.012.666713.5143253
3D1617.018.018.019.0205055

This page was generated using Literate.jl.