@summary

The @summary() macro in TidierData.jl provides a concise way to compute summary statistics on data. Similar to its R counterpart, it will provide the mean, median, Q1, Q3, minimum, maximum, and number of missing values in a numerical column or columns.

Summary for the whole dataframe¤

using TidierData

df = DataFrame( A = [1, 2, 3, 4, 5], B = [missing, 7, 8, 9, 10], C = [11, missing, 13, 14, missing], D = [16, 17, 18, 19, 20]);

@chain df begin
    @summary()
end

@summary(df)

4×11 DataFrame

Row	column	min	q1	median	mean	q3	max	non_missing_values	missing_values	total_values	unique_values
	String	Int64	Float64	Float64	Float64	Float64	Int64	Int64	Int64	Int64	Int64
1	A	1	2.0	3.0	3.0	4.0	5	5	0	5	5
2	B	7	7.75	8.5	8.5	9.25	10	4	1	5	4
3	C	11	12.0	13.0	12.6667	13.5	14	3	2	5	3
4	D	16	17.0	18.0	18.0	19.0	20	5	0	5	5

You can specify columns for which you want to compute the summary. This is useful if the DataFrame has a large number of columns and you're interested in only a subset of them.¤

@chain df begin
    @summary(B)
end

@summary(df, B)

1×11 DataFrame

Row	column	min	q1	median	mean	q3	max	non_missing_values	missing_values	total_values	unique_values
	String	Int64	Float64	Float64	Float64	Float64	Int64	Int64	Int64	Int64	Int64
1	B	7	7.75	8.5	8.5	9.25	10	4	1	5	4

or for a range of columns¤

@chain df begin
    @select(B:D)
    @summary() # you can also write this @summary(2:4)
end

3×11 DataFrame

Row	column	min	q1	median	mean	q3	max	non_missing_values	missing_values	total_values	unique_values
	String	Int64	Float64	Float64	Float64	Float64	Int64	Int64	Int64	Int64	Int64
1	B	7	7.75	8.5	8.5	9.25	10	4	1	5	4
2	C	11	12.0	13.0	12.6667	13.5	14	3	2	5	3
3	D	16	17.0	18.0	18.0	19.0	20	5	0	5	5

This page was generated using Literate.jl.