Skip to content

@slice

Slicing rows is similar to filtering rows, except that slicing is performed based on row numbers rather tha filter criteria. In TidierData.jl, slicing works similarly to R's tidyverse in that both positive (which rows to keep) and negative (which rows to remove) slicing is supported. For @slice(), any valid UnitRange of integers is considered valid; this is not the case for @select() or across().

Remember: Just like every other TidierData.jl top-level macro, @slice() respects group. This means that in a grouped data frame, @slice(1:2) will select the first 2 rows from each group.

using TidierData

df = DataFrame(row_num = 1:10,
               a = string.(repeat('a':'e', inner = 2)),
               b = [1,1,1,2,2,2,3,3,3,4])
10×3 DataFrame
Rowrow_numab
Int64StringInt64
11a1
22a1
33b1
44b2
55c2
66c2
77d3
88d3
99e3
1010e4

Slicing using a range of numbers¤

This is an easy way of retrieving 5 consecutive rows.

@chain df begin
    @slice(1:5)
end
5×3 DataFrame
Rowrow_numab
Int64StringInt64
11a1
22a1
33b1
44b2
55c2

Slicing using a more complex UnitRange of numbers¤

How would we obtain every other from 1 to 7 (counting up by 2)? Note that range() is similar to seq() in R.

@chain df begin
  @slice(range(start = 1, step = 2, stop = 7))
end
4×3 DataFrame
Rowrow_numab
Int64StringInt64
11a1
23b1
35c2
47d3

This same code can also be written using Julia's shorthand syntax for unit ranges.

@chain df begin
  @slice(1:2:7)
end
4×3 DataFrame
Rowrow_numab
Int64StringInt64
11a1
23b1
35c2
47d3

Separate multiple row selections with commas¤

If you have multiple different row selections, you can separate them with commas.

@chain df begin
    @slice(1:5, 10)
end
6×3 DataFrame
Rowrow_numab
Int64StringInt64
11a1
22a1
33b1
44b2
55c2
610e4

Use n() as short-hand to indicate the number of rows¤

Select the last 2 rows.

@chain df begin
  @slice(n()-1, n())
end
2×3 DataFrame
Rowrow_numab
Int64StringInt64
19e3
210e4

You can even use n() inside of UnitRanges, just like in R. Notice that the order of operations is slightly different in Julia as compared to R, so you don't have to wrap the n()-1 expression inside of parentheses.

@chain df begin
  @slice(n()-1:n())
end
2×3 DataFrame
Rowrow_numab
Int64StringInt64
19e3
210e4

Inverted selection using negative numbers¤

This line selects all rows except the first 5 rows.

@chain df begin
    @slice(-(1:5))
end
5×3 DataFrame
Rowrow_numab
Int64StringInt64
16c2
27d3
38d3
49e3
510e4

Sample 5 random rows in the data frame¤

@chain df begin
  @slice_sample(n = 5)
end
5×3 DataFrame
Rowrow_numab
Int64StringInt64
110e4
25c2
39e3
48d3
53b1

Slice the min¤

This line selects all rows with the the minimum value of the desired column

@chain df begin
  @slice_min(b)
end
3×3 DataFrame
Rowrow_numab
Int64StringInt64
11a1
22a1
33b1

This line will only show the first row.

@chain df begin
  @slice_min(b, with_ties = false)
end
1×3 DataFrame
Rowrow_numab
Int64StringInt64
11a1

Slice the max¤

The optional prop arguement will slice a proportion of the full dataframe.

@chain df begin
  @slice_max(b, prop = 0.5)
end
5×3 DataFrame
Rowrow_numab
Int64StringInt64
110e4
27d3
38d3
49e3
54b2

Slice the tail¤

@chain df begin
  @slice_tail(prop = 0.5)
end
5×3 DataFrame
Rowrow_numab
Int64StringInt64
16c2
27d3
38d3
49e3
510e4

Slice the head¤

@chain df begin
  @slice_head(n = 3)
end
3×3 DataFrame
Rowrow_numab
Int64StringInt64
11a1
22a1
33b1

This page was generated using Literate.jl.