Skip to content

@select

The @select() macro in TidierData.jl supports many of the nuances of the R tidyverse implementation, including indexing columns individually by name or number, indexing by ranges of columns using the : operator between column names or numbers, and negative selection using negated column names or numbers. Selection helpers such as starts_with(), ends_with(), matches(), and contains() are also supported.

using TidierData
using RDatasets

movies = dataset("ggplot2", "movies");

Select the first 5 columns individually by name¤

@chain movies begin
    @select(Title, Year, Length, Budget, Rating)
    @slice(1:5)
end
5×5 DataFrame
RowTitleYearLengthBudgetRating
StringInt32Int32Int32?Float64
1$1971121missing6.4
2$1000 a Touchdown193971missing6.0
3$21 a Day Once a Month19417missing8.2
4$40,000199670missing8.2
5$50,000 Climax Show, The197571missing3.4

Select the first 5 columns individually by number¤

@chain movies begin
    @select(1, 2, 3, 4, 5)
    @slice(1:5)
end
5×5 DataFrame
RowTitleYearLengthBudgetRating
StringInt32Int32Int32?Float64
1$1971121missing6.4
2$1000 a Touchdown193971missing6.0
3$21 a Day Once a Month19417missing8.2
4$40,000199670missing8.2
5$50,000 Climax Show, The197571missing3.4

Select the first 5 columns by name (using a range)¤

@chain movies begin
    @select(Title:Rating)
    @slice(1:5)
end
5×5 DataFrame
RowTitleYearLengthBudgetRating
StringInt32Int32Int32?Float64
1$1971121missing6.4
2$1000 a Touchdown193971missing6.0
3$21 a Day Once a Month19417missing8.2
4$40,000199670missing8.2
5$50,000 Climax Show, The197571missing3.4

Select the first 5 columns by number (using a range)¤

@chain movies begin
    @select(1:5)
    @slice(1:5)
end
5×5 DataFrame
RowTitleYearLengthBudgetRating
StringInt32Int32Int32?Float64
1$1971121missing6.4
2$1000 a Touchdown193971missing6.0
3$21 a Day Once a Month19417missing8.2
4$40,000199670missing8.2
5$50,000 Climax Show, The197571missing3.4

Select all but the first 5 columns by name¤

Here we will limit the results to the first 5 remaining columns and the first 5 rows for the sake of brevity.

@chain movies begin
    @select(-(Title:Rating))
    @select(1:5)
    @slice(1:5)
end
5×5 DataFrame
RowVotesR1R2R3R4
Int32Float64Float64Float64Float64
13484.54.54.54.5
2200.014.54.524.5
350.00.00.00.0
4614.50.00.00.0
51724.54.50.014.5

We can also use ! for inverted selection instead of -.

@chain movies begin
  @select(!(Title:Rating))
  @select(1:5)
  @slice(1:5)
end
5×5 DataFrame
RowVotesR1R2R3R4
Int32Float64Float64Float64Float64
13484.54.54.54.5
2200.014.54.524.5
350.00.00.00.0
4614.50.00.00.0
51724.54.50.014.5

Select all but the first 5 columns by number¤

We will again limit the results to the first 5 remaining columns and the first 5 rows for the sake of brevity.

@chain movies begin
    @select(-(1:5))
    @select(1:5)
    @slice(1:5)
end
5×5 DataFrame
RowVotesR1R2R3R4
Int32Float64Float64Float64Float64
13484.54.54.54.5
2200.014.54.524.5
350.00.00.00.0
4614.50.00.00.0
51724.54.50.014.5

Mix and match selection¤

Just like in R's tidyverse, you can separate multiple selections with commas and mix and match different ways of selecting columns.

@chain movies begin
    @select(1, Budget:Rating)
    @slice(1:5)
end
5×3 DataFrame
RowTitleBudgetRating
StringInt32?Float64
1$missing6.4
2$1000 a Touchdownmissing6.0
3$21 a Day Once a Monthmissing8.2
4$40,000missing8.2
5$50,000 Climax Show, Themissing3.4

This page was generated using Literate.jl.