@select

The @select() macro in TidierData.jl supports many of the nuances of the R tidyverse implementation, including indexing columns individually by name or number, indexing by ranges of columns using the : operator between column names or numbers, and negative selection using negated column names or numbers. Selection helpers such as starts_with(), ends_with(), matches(), and contains() are also supported.

using TidierData
using RDatasets

movies = dataset("ggplot2", "movies");

Select the first 5 columns individually by name¤

@chain movies begin
    @select(Title, Year, Length, Budget, Rating)
    @slice(1:5)
end

5×5 DataFrame

Row	Title	Year	Length	Budget	Rating
	String	Int32	Int32	Int32?	Float64
1	$	1971	121	missing	6.4
2	$1000 a Touchdown	1939	71	missing	6.0
3	$21 a Day Once a Month	1941	7	missing	8.2
4	$40,000	1996	70	missing	8.2
5	$50,000 Climax Show, The	1975	71	missing	3.4

Select the first 5 columns individually by number¤

@chain movies begin
    @select(1, 2, 3, 4, 5)
    @slice(1:5)
end

5×5 DataFrame

Row	Title	Year	Length	Budget	Rating
	String	Int32	Int32	Int32?	Float64
1	$	1971	121	missing	6.4
2	$1000 a Touchdown	1939	71	missing	6.0
3	$21 a Day Once a Month	1941	7	missing	8.2
4	$40,000	1996	70	missing	8.2
5	$50,000 Climax Show, The	1975	71	missing	3.4

Select the first 5 columns by name (using a range)¤

@chain movies begin
    @select(Title:Rating)
    @slice(1:5)
end

5×5 DataFrame

Row	Title	Year	Length	Budget	Rating
	String	Int32	Int32	Int32?	Float64
1	$	1971	121	missing	6.4
2	$1000 a Touchdown	1939	71	missing	6.0
3	$21 a Day Once a Month	1941	7	missing	8.2
4	$40,000	1996	70	missing	8.2
5	$50,000 Climax Show, The	1975	71	missing	3.4

Select the first 5 columns by number (using a range)¤

@chain movies begin
    @select(1:5)
    @slice(1:5)
end

5×5 DataFrame

Row	Title	Year	Length	Budget	Rating
	String	Int32	Int32	Int32?	Float64
1	$	1971	121	missing	6.4
2	$1000 a Touchdown	1939	71	missing	6.0
3	$21 a Day Once a Month	1941	7	missing	8.2
4	$40,000	1996	70	missing	8.2
5	$50,000 Climax Show, The	1975	71	missing	3.4

Select all but the first 5 columns by name¤

Here we will limit the results to the first 5 remaining columns and the first 5 rows for the sake of brevity.

@chain movies begin
    @select(-(Title:Rating))
    @select(1:5)
    @slice(1:5)
end

5×5 DataFrame

Row	Votes	R1	R2	R3	R4
	Int32	Float64	Float64	Float64	Float64
1	348	4.5	4.5	4.5	4.5
2	20	0.0	14.5	4.5	24.5
3	5	0.0	0.0	0.0	0.0
4	6	14.5	0.0	0.0	0.0
5	17	24.5	4.5	0.0	14.5

We can also use ! for inverted selection instead of -.

@chain movies begin
  @select(!(Title:Rating))
  @select(1:5)
  @slice(1:5)
end

5×5 DataFrame

Row	Votes	R1	R2	R3	R4
	Int32	Float64	Float64	Float64	Float64
1	348	4.5	4.5	4.5	4.5
2	20	0.0	14.5	4.5	24.5
3	5	0.0	0.0	0.0	0.0
4	6	14.5	0.0	0.0	0.0
5	17	24.5	4.5	0.0	14.5

Select all but the first 5 columns by number¤

We will again limit the results to the first 5 remaining columns and the first 5 rows for the sake of brevity.

@chain movies begin
    @select(-(1:5))
    @select(1:5)
    @slice(1:5)
end

5×5 DataFrame

Row	Votes	R1	R2	R3	R4
	Int32	Float64	Float64	Float64	Float64
1	348	4.5	4.5	4.5	4.5
2	20	0.0	14.5	4.5	24.5
3	5	0.0	0.0	0.0	0.0
4	6	14.5	0.0	0.0	0.0
5	17	24.5	4.5	0.0	14.5

Mix and match selection¤

Just like in R's tidyverse, you can separate multiple selections with commas and mix and match different ways of selecting columns.

@chain movies begin
    @select(1, Budget:Rating)
    @slice(1:5)
end

5×3 DataFrame

Row	Title	Budget	Rating
	String	Int32?	Float64
1	$	missing	6.4
2	$1000 a Touchdown	missing	6.0
3	$21 a Day Once a Month	missing	8.2
4	$40,000	missing	8.2
5	$50,000 Climax Show, The	missing	3.4

This page was generated using Literate.jl.