@summary
The @summary()
macro in TidierData.jl
provides a concise way to compute summary statistics on data. Similar to its R counterpart, it will provide the mean, median, Q1, Q3, minimum, maximum, and number of missing values in a numerical column or columns.
Summary for the whole dataframe¤
using TidierData
df = DataFrame( A = [1, 2, 3, 4, 5], B = [missing, 7, 8, 9, 10], C = [11, missing, 13, 14, missing], D = [16, 17, 18, 19, 20]);
@chain df begin
@summary()
end
@summary(df)
4×9 DataFrame
Row | Column | Min | Q1 | Median | Mean | Q3 | Max | Count | Missing_Count |
---|---|---|---|---|---|---|---|---|---|
String | Int64 | Float64 | Float64 | Float64 | Float64 | Int64 | Int64 | Int64 | |
1 | A | 1 | 2.0 | 3.0 | 3.0 | 4.0 | 5 | 5 | 0 |
2 | B | 7 | 7.75 | 8.5 | 8.5 | 9.25 | 10 | 4 | 1 |
3 | C | 11 | 12.0 | 13.0 | 12.6667 | 13.5 | 14 | 3 | 2 |
4 | D | 16 | 17.0 | 18.0 | 18.0 | 19.0 | 20 | 5 | 0 |
You can specify columns for which you want to compute the summary. This is useful if the DataFrame has a large number of columns and you're interested in only a subset of them.¤
@chain df begin
@summary(B)
end
@summary(df, B)
1×9 DataFrame
Row | Column | Min | Q1 | Median | Mean | Q3 | Max | Count | Missing_Count |
---|---|---|---|---|---|---|---|---|---|
String | Int64 | Float64 | Float64 | Float64 | Float64 | Int64 | Int64 | Int64 | |
1 | B | 7 | 7.75 | 8.5 | 8.5 | 9.25 | 10 | 4 | 1 |
or for a range of columns¤
@chain df begin
@select(B:D)
@summary() # you can also write this @summary(2:4)
end
3×9 DataFrame
Row | Column | Min | Q1 | Median | Mean | Q3 | Max | Count | Missing_Count |
---|---|---|---|---|---|---|---|---|---|
String | Int64 | Float64 | Float64 | Float64 | Float64 | Int64 | Int64 | Int64 | |
1 | B | 7 | 7.75 | 8.5 | 8.5 | 9.25 | 10 | 4 | 1 |
2 | C | 11 | 12.0 | 13.0 | 12.6667 | 13.5 | 14 | 3 | 2 |
3 | D | 16 | 17.0 | 18.0 | 18.0 | 19.0 | 20 | 5 | 0 |
This page was generated using Literate.jl.