@slice

Slicing rows is similar to filtering rows, except that slicing is performed based on row numbers rather tha filter criteria. In TidierData.jl, slicing works similarly to R's tidyverse in that both positive (which rows to keep) and negative (which rows to remove) slicing is supported. For @slice(), any valid UnitRange of integers is considered valid; this is not the case for @select() or across().

Remember: Just like every other TidierData.jl top-level macro, @slice() respects group. This means that in a grouped data frame, @slice(1:2) will select the first 2 rows from each group.

using TidierData

df = DataFrame(row_num = 1:10,
               a = string.(repeat('a':'e', inner = 2)),
               b = [1,1,1,2,2,2,3,3,3,4])

10×3 DataFrame

Row	row_num	a	b
	Int64	String	Int64
1	1	a	1
2	2	a	1
3	3	b	1
4	4	b	2
5	5	c	2
6	6	c	2
7	7	d	3
8	8	d	3
9	9	e	3
10	10	e	4

Slicing using a range of numbers¤

This is an easy way of retrieving 5 consecutive rows.

@chain df begin
    @slice(1:5)
end

5×3 DataFrame

Row	row_num	a	b
	Int64	String	Int64
1	1	a	1
2	2	a	1
3	3	b	1
4	4	b	2
5	5	c	2

Slicing using a more complex UnitRange of numbers¤

How would we obtain every other from 1 to 7 (counting up by 2)? Note that range() is similar to seq() in R.

@chain df begin
  @slice(range(start = 1, step = 2, stop = 7))
end

4×3 DataFrame

Row	row_num	a	b
	Int64	String	Int64
1	1	a	1
2	3	b	1
3	5	c	2
4	7	d	3

This same code can also be written using Julia's shorthand syntax for unit ranges.

@chain df begin
  @slice(1:2:7)
end

4×3 DataFrame

Row	row_num	a	b
	Int64	String	Int64
1	1	a	1
2	3	b	1
3	5	c	2
4	7	d	3

Separate multiple row selections with commas¤

If you have multiple different row selections, you can separate them with commas.

@chain df begin
    @slice(1:5, 10)
end

6×3 DataFrame

Row	row_num	a	b
	Int64	String	Int64
1	1	a	1
2	2	a	1
3	3	b	1
4	4	b	2
5	5	c	2
6	10	e	4

Use `n()` as short-hand to indicate the number of rows¤

Select the last 2 rows.

@chain df begin
  @slice(n()-1, n())
end

2×3 DataFrame

Row	row_num	a	b
	Int64	String	Int64
1	9	e	3
2	10	e	4

You can even use n() inside of UnitRanges, just like in R. Notice that the order of operations is slightly different in Julia as compared to R, so you don't have to wrap the n()-1 expression inside of parentheses.

@chain df begin
  @slice(n()-1:n())
end

2×3 DataFrame

Row	row_num	a	b
	Int64	String	Int64
1	9	e	3
2	10	e	4

Inverted selection using negative numbers¤

This line selects all rows except the first 5 rows.

@chain df begin
    @slice(-(1:5))
end

5×3 DataFrame

Row	row_num	a	b
	Int64	String	Int64
1	6	c	2
2	7	d	3
3	8	d	3
4	9	e	3
5	10	e	4

Sample 5 random rows in the data frame¤

@chain df begin
  @slice_sample(n = 5)
end

5×3 DataFrame

Row	row_num	a	b
	Int64	String	Int64
1	10	e	4
2	2	a	1
3	8	d	3
4	9	e	3
5	4	b	2

Slice the min¤

This line selects all rows with the the minimum value of the desired column

@chain df begin
  @slice_min(b)
end

3×3 DataFrame

Row	row_num	a	b
	Int64	String	Int64
1	1	a	1
2	2	a	1
3	3	b	1

This line will only show the first row.

@chain df begin
  @slice_min(b, with_ties = false)
end

1×3 DataFrame

Row	row_num	a	b
	Int64	String	Int64
1	1	a	1

Slice the max¤

The optional prop arguement will slice a proportion of the full dataframe.

@chain df begin
  @slice_max(b, prop = 0.5)
end

5×3 DataFrame

Row	row_num	a	b
	Int64	String	Int64
1	10	e	4
2	7	d	3
3	8	d	3
4	9	e	3
5	4	b	2

Slice the tail¤

@chain df begin
  @slice_tail(prop = 0.5)
end

5×3 DataFrame

Row	row_num	a	b
	Int64	String	Int64
1	6	c	2
2	7	d	3
3	8	d	3
4	9	e	3
5	10	e	4

Slice the head¤

@chain df begin
  @slice_head(n = 3)
end

3×3 DataFrame

Row	row_num	a	b
	Int64	String	Int64
1	1	a	1
2	2	a	1
3	3	b	1

This page was generated using Literate.jl.

@slice

Slicing using a range of numbers¤

Slicing using a more complex UnitRange of numbers¤

Separate multiple row selections with commas¤

Use n() as short-hand to indicate the number of rows¤

Inverted selection using negative numbers¤

Sample 5 random rows in the data frame¤

Slice the min¤

Slice the max¤

Slice the tail¤

Slice the head¤

Use `n()` as short-hand to indicate the number of rows¤