1  Operations on rows

In this chapter we will see operations that deal with rows, be it ordering or throwing some rows away.

The following is necessary to run all examples:

using DataFrames, PalmerPenguins
using Tidier
import DataFramesMeta as DFM

penguins = PalmerPenguins.load() |> DataFrame;
@slice_head(penguins, n = 10)
10×7 DataFrame
Row species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
String15 String15 Float64? Float64? Int64? Int64? String7?
1 Adelie Torgersen 39.1 18.7 181 3750 male
2 Adelie Torgersen 39.5 17.4 186 3800 female
3 Adelie Torgersen 40.3 18.0 195 3250 female
4 Adelie Torgersen missing missing missing missing missing
5 Adelie Torgersen 36.7 19.3 193 3450 female
6 Adelie Torgersen 39.3 20.6 190 3650 male
7 Adelie Torgersen 38.9 17.8 181 3625 female
8 Adelie Torgersen 39.2 19.6 195 4675 male
9 Adelie Torgersen 34.1 18.1 193 3475 missing
10 Adelie Torgersen 42.0 20.2 190 4250 missing

1.1 Filtering (or: throwing rows away)

To filter a dataframe means keeping only the rows that satisfy a certain criteria (ie. a boolean condition).

To filter in Tidier, we use the macro @filter. You can use it in the form

@filter(penguins, species == "Adelie")
152×7 DataFrame
127 rows omitted
Row species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
String15 String15 Float64? Float64? Int64? Int64? String7?
1 Adelie Torgersen 39.1 18.7 181 3750 male
2 Adelie Torgersen 39.5 17.4 186 3800 female
3 Adelie Torgersen 40.3 18.0 195 3250 female
4 Adelie Torgersen missing missing missing missing missing
5 Adelie Torgersen 36.7 19.3 193 3450 female
6 Adelie Torgersen 39.3 20.6 190 3650 male
7 Adelie Torgersen 38.9 17.8 181 3625 female
8 Adelie Torgersen 39.2 19.6 195 4675 male
9 Adelie Torgersen 34.1 18.1 193 3475 missing
10 Adelie Torgersen 42.0 20.2 190 4250 missing
11 Adelie Torgersen 37.8 17.1 186 3300 missing
12 Adelie Torgersen 37.8 17.3 180 3700 missing
13 Adelie Torgersen 41.1 17.6 182 3200 female
141 Adelie Dream 40.2 17.1 193 3400 female
142 Adelie Dream 40.6 17.2 187 3475 male
143 Adelie Dream 32.1 15.5 188 3050 female
144 Adelie Dream 40.7 17.0 190 3725 male
145 Adelie Dream 37.3 16.8 192 3000 female
146 Adelie Dream 39.0 18.7 185 3650 male
147 Adelie Dream 39.2 18.6 190 4250 male
148 Adelie Dream 36.6 18.4 184 3475 female
149 Adelie Dream 36.0 17.8 195 3450 female
150 Adelie Dream 37.8 18.1 193 3750 male
151 Adelie Dream 36.0 17.1 187 3700 female
152 Adelie Dream 41.5 18.5 201 4000 male

or without parentesis as in

@filter penguins species == "Adelie"
152×7 DataFrame
127 rows omitted
Row species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
String15 String15 Float64? Float64? Int64? Int64? String7?
1 Adelie Torgersen 39.1 18.7 181 3750 male
2 Adelie Torgersen 39.5 17.4 186 3800 female
3 Adelie Torgersen 40.3 18.0 195 3250 female
4 Adelie Torgersen missing missing missing missing missing
5 Adelie Torgersen 36.7 19.3 193 3450 female
6 Adelie Torgersen 39.3 20.6 190 3650 male
7 Adelie Torgersen 38.9 17.8 181 3625 female
8 Adelie Torgersen 39.2 19.6 195 4675 male
9 Adelie Torgersen 34.1 18.1 193 3475 missing
10 Adelie Torgersen 42.0 20.2 190 4250 missing
11 Adelie Torgersen 37.8 17.1 186 3300 missing
12 Adelie Torgersen 37.8 17.3 180 3700 missing
13 Adelie Torgersen 41.1 17.6 182 3200 female
141 Adelie Dream 40.2 17.1 193 3400 female
142 Adelie Dream 40.6 17.2 187 3475 male
143 Adelie Dream 32.1 15.5 188 3050 female
144 Adelie Dream 40.7 17.0 190 3725 male
145 Adelie Dream 37.3 16.8 192 3000 female
146 Adelie Dream 39.0 18.7 185 3650 male
147 Adelie Dream 39.2 18.6 190 4250 male
148 Adelie Dream 36.6 18.4 184 3475 female
149 Adelie Dream 36.0 17.8 195 3450 female
150 Adelie Dream 37.8 18.1 193 3750 male
151 Adelie Dream 36.0 17.1 187 3700 female
152 Adelie Dream 41.5 18.5 201 4000 male

Notice that the columns are typed as if they were variables on the Julia environment. This is inspired by the tidyverse behaviour of data-masking: inside a tidyverse verb, the columns are taken as “statistical variables” that exist inside the dataframe as columns.

In DataFramesMeta, we have two macros for filtering: @subset and @rsubset. Use the first when you have some criteria that uses a whole column, for example:

DFM.@subset penguins :body_mass_g .>= mean(skipmissing(:body_mass_g))
149×7 DataFrame
124 rows omitted
Row species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
String15 String15 Float64? Float64? Int64? Int64? String7?
1 Adelie Torgersen 39.2 19.6 195 4675 male
2 Adelie Torgersen 42.0 20.2 190 4250 missing
3 Adelie Torgersen 34.6 21.1 198 4400 male
4 Adelie Torgersen 42.5 20.7 197 4500 male
5 Adelie Dream 39.8 19.1 184 4650 male
6 Adelie Dream 44.1 19.7 196 4400 male
7 Adelie Dream 39.6 18.8 190 4600 male
8 Adelie Biscoe 40.1 18.9 188 4300 male
9 Adelie Biscoe 41.3 21.1 195 4400 male
10 Adelie Torgersen 41.8 19.4 198 4450 male
11 Adelie Torgersen 42.8 18.5 195 4250 male
12 Adelie Torgersen 42.9 17.6 196 4700 male
13 Adelie Dream 41.1 18.1 205 4300 male
138 Gentoo Biscoe 47.2 13.7 214 4925 female
139 Gentoo Biscoe 46.8 14.3 215 4850 female
140 Gentoo Biscoe 50.4 15.7 222 5750 male
141 Gentoo Biscoe 45.2 14.8 212 5200 female
142 Gentoo Biscoe 49.9 16.1 213 5400 male
143 Chinstrap Dream 49.2 18.2 195 4400 male
144 Chinstrap Dream 52.8 20.0 205 4550 male
145 Chinstrap Dream 54.2 20.8 201 4300 male
146 Chinstrap Dream 52.0 20.7 210 4800 male
147 Chinstrap Dream 53.5 19.9 205 4500 male
148 Chinstrap Dream 50.8 18.5 201 4450 male
149 Chinstrap Dream 49.0 19.6 212 4300 male

Notice the broadcast on >=. We need it because each variable is interpreted as a vector (the whole column). Also, notice that we refer to columns as symbols (i.e. we append : to it).

In the above example, we needed the whole column body_mass_g to take the mean and then filter the rows based on that. If, however, your filtering criteria only uses information about each row (without needing to see it in context of the whole column), then @rsubset (row subset) is easier to use: it interprets each columns as a value (not an array), so no broadcasting is needed:

DFM.@rsubset penguins :species == "Adelie"
152×7 DataFrame
127 rows omitted
Row species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
String15 String15 Float64? Float64? Int64? Int64? String7?
1 Adelie Torgersen 39.1 18.7 181 3750 male
2 Adelie Torgersen 39.5 17.4 186 3800 female
3 Adelie Torgersen 40.3 18.0 195 3250 female
4 Adelie Torgersen missing missing missing missing missing
5 Adelie Torgersen 36.7 19.3 193 3450 female
6 Adelie Torgersen 39.3 20.6 190 3650 male
7 Adelie Torgersen 38.9 17.8 181 3625 female
8 Adelie Torgersen 39.2 19.6 195 4675 male
9 Adelie Torgersen 34.1 18.1 193 3475 missing
10 Adelie Torgersen 42.0 20.2 190 4250 missing
11 Adelie Torgersen 37.8 17.1 186 3300 missing
12 Adelie Torgersen 37.8 17.3 180 3700 missing
13 Adelie Torgersen 41.1 17.6 182 3200 female
141 Adelie Dream 40.2 17.1 193 3400 female
142 Adelie Dream 40.6 17.2 187 3475 male
143 Adelie Dream 32.1 15.5 188 3050 female
144 Adelie Dream 40.7 17.0 190 3725 male
145 Adelie Dream 37.3 16.8 192 3000 female
146 Adelie Dream 39.0 18.7 185 3650 male
147 Adelie Dream 39.2 18.6 190 4250 male
148 Adelie Dream 36.6 18.4 184 3475 female
149 Adelie Dream 36.0 17.8 195 3450 female
150 Adelie Dream 37.8 18.1 193 3750 male
151 Adelie Dream 36.0 17.1 187 3700 female
152 Adelie Dream 41.5 18.5 201 4000 male

In both Tidier and DataFramesMeta, only the rows to which the criteria is true are returned. This means that false and missing are thrown away.

In pure DataFrames, we use the subset function, and the criteria is passed with the notation

subset(penguins, :column => boolean_function)

where boolean_function is a boolean (with possibly missing values) function on 1 variable (the :column you passed). Add the kwarg skipmissing=true if you want to get rid of missing values.

1.1.1 Filtering with one criteria

Problem: Filtering all the rows with species == “Adelie”.

@filter penguins species == "Adelie"
152×7 DataFrame
127 rows omitted
Row species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
String15 String15 Float64? Float64? Int64? Int64? String7?
1 Adelie Torgersen 39.1 18.7 181 3750 male
2 Adelie Torgersen 39.5 17.4 186 3800 female
3 Adelie Torgersen 40.3 18.0 195 3250 female
4 Adelie Torgersen missing missing missing missing missing
5 Adelie Torgersen 36.7 19.3 193 3450 female
6 Adelie Torgersen 39.3 20.6 190 3650 male
7 Adelie Torgersen 38.9 17.8 181 3625 female
8 Adelie Torgersen 39.2 19.6 195 4675 male
9 Adelie Torgersen 34.1 18.1 193 3475 missing
10 Adelie Torgersen 42.0 20.2 190 4250 missing
11 Adelie Torgersen 37.8 17.1 186 3300 missing
12 Adelie Torgersen 37.8 17.3 180 3700 missing
13 Adelie Torgersen 41.1 17.6 182 3200 female
141 Adelie Dream 40.2 17.1 193 3400 female
142 Adelie Dream 40.6 17.2 187 3475 male
143 Adelie Dream 32.1 15.5 188 3050 female
144 Adelie Dream 40.7 17.0 190 3725 male
145 Adelie Dream 37.3 16.8 192 3000 female
146 Adelie Dream 39.0 18.7 185 3650 male
147 Adelie Dream 39.2 18.6 190 4250 male
148 Adelie Dream 36.6 18.4 184 3475 female
149 Adelie Dream 36.0 17.8 195 3450 female
150 Adelie Dream 37.8 18.1 193 3750 male
151 Adelie Dream 36.0 17.1 187 3700 female
152 Adelie Dream 41.5 18.5 201 4000 male
DFM.@rsubset penguins :species == "Adelie"
152×7 DataFrame
127 rows omitted
Row species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
String15 String15 Float64? Float64? Int64? Int64? String7?
1 Adelie Torgersen 39.1 18.7 181 3750 male
2 Adelie Torgersen 39.5 17.4 186 3800 female
3 Adelie Torgersen 40.3 18.0 195 3250 female
4 Adelie Torgersen missing missing missing missing missing
5 Adelie Torgersen 36.7 19.3 193 3450 female
6 Adelie Torgersen 39.3 20.6 190 3650 male
7 Adelie Torgersen 38.9 17.8 181 3625 female
8 Adelie Torgersen 39.2 19.6 195 4675 male
9 Adelie Torgersen 34.1 18.1 193 3475 missing
10 Adelie Torgersen 42.0 20.2 190 4250 missing
11 Adelie Torgersen 37.8 17.1 186 3300 missing
12 Adelie Torgersen 37.8 17.3 180 3700 missing
13 Adelie Torgersen 41.1 17.6 182 3200 female
141 Adelie Dream 40.2 17.1 193 3400 female
142 Adelie Dream 40.6 17.2 187 3475 male
143 Adelie Dream 32.1 15.5 188 3050 female
144 Adelie Dream 40.7 17.0 190 3725 male
145 Adelie Dream 37.3 16.8 192 3000 female
146 Adelie Dream 39.0 18.7 185 3650 male
147 Adelie Dream 39.2 18.6 190 4250 male
148 Adelie Dream 36.6 18.4 184 3475 female
149 Adelie Dream 36.0 17.8 195 3450 female
150 Adelie Dream 37.8 18.1 193 3750 male
151 Adelie Dream 36.0 17.1 187 3700 female
152 Adelie Dream 41.5 18.5 201 4000 male
subset(penguins, :species => x -> x .== "Adelie", skipmissing=true)
152×7 DataFrame
127 rows omitted
Row species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
String15 String15 Float64? Float64? Int64? Int64? String7?
1 Adelie Torgersen 39.1 18.7 181 3750 male
2 Adelie Torgersen 39.5 17.4 186 3800 female
3 Adelie Torgersen 40.3 18.0 195 3250 female
4 Adelie Torgersen missing missing missing missing missing
5 Adelie Torgersen 36.7 19.3 193 3450 female
6 Adelie Torgersen 39.3 20.6 190 3650 male
7 Adelie Torgersen 38.9 17.8 181 3625 female
8 Adelie Torgersen 39.2 19.6 195 4675 male
9 Adelie Torgersen 34.1 18.1 193 3475 missing
10 Adelie Torgersen 42.0 20.2 190 4250 missing
11 Adelie Torgersen 37.8 17.1 186 3300 missing
12 Adelie Torgersen 37.8 17.3 180 3700 missing
13 Adelie Torgersen 41.1 17.6 182 3200 female
141 Adelie Dream 40.2 17.1 193 3400 female
142 Adelie Dream 40.6 17.2 187 3475 male
143 Adelie Dream 32.1 15.5 188 3050 female
144 Adelie Dream 40.7 17.0 190 3725 male
145 Adelie Dream 37.3 16.8 192 3000 female
146 Adelie Dream 39.0 18.7 185 3650 male
147 Adelie Dream 39.2 18.6 190 4250 male
148 Adelie Dream 36.6 18.4 184 3475 female
149 Adelie Dream 36.0 17.8 195 3450 female
150 Adelie Dream 37.8 18.1 193 3750 male
151 Adelie Dream 36.0 17.1 187 3700 female
152 Adelie Dream 41.5 18.5 201 4000 male

1.1.2 Filtering with several criteria

Problem: Filtering all the rows with species == “Adelie”, sex == “male” and body_mass_g > 4000.

@filter penguins species == "Adelie" sex == "male" body_mass_g > 4000
34×7 DataFrame
9 rows omitted
Row species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
String15 String15 Float64? Float64? Int64? Int64? String7?
1 Adelie Torgersen 39.2 19.6 195 4675 male
2 Adelie Torgersen 34.6 21.1 198 4400 male
3 Adelie Torgersen 42.5 20.7 197 4500 male
4 Adelie Torgersen 46.0 21.5 194 4200 male
5 Adelie Dream 39.2 21.1 196 4150 male
6 Adelie Dream 39.8 19.1 184 4650 male
7 Adelie Dream 44.1 19.7 196 4400 male
8 Adelie Dream 39.6 18.8 190 4600 male
9 Adelie Dream 42.3 21.2 191 4150 male
10 Adelie Biscoe 40.1 18.9 188 4300 male
11 Adelie Biscoe 42.0 19.5 200 4050 male
12 Adelie Biscoe 41.3 21.1 195 4400 male
13 Adelie Biscoe 41.1 18.2 192 4050 male
23 Adelie Dream 40.3 18.5 196 4350 male
24 Adelie Dream 43.2 18.5 192 4100 male
25 Adelie Biscoe 41.0 20.0 203 4725 male
26 Adelie Biscoe 37.8 20.0 190 4250 male
27 Adelie Biscoe 43.2 19.0 197 4775 male
28 Adelie Biscoe 45.6 20.3 191 4600 male
29 Adelie Biscoe 42.2 19.5 197 4275 male
30 Adelie Biscoe 42.7 18.3 196 4075 male
31 Adelie Torgersen 41.5 18.3 195 4300 male
32 Adelie Dream 37.5 18.5 199 4475 male
33 Adelie Dream 39.7 17.9 193 4250 male
34 Adelie Dream 39.2 18.6 190 4250 male
DFM.@rsubset penguins :species == "Adelie" :sex == "male" :body_mass_g > 4000
34×7 DataFrame
9 rows omitted
Row species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
String15 String15 Float64? Float64? Int64? Int64? String7?
1 Adelie Torgersen 39.2 19.6 195 4675 male
2 Adelie Torgersen 34.6 21.1 198 4400 male
3 Adelie Torgersen 42.5 20.7 197 4500 male
4 Adelie Torgersen 46.0 21.5 194 4200 male
5 Adelie Dream 39.2 21.1 196 4150 male
6 Adelie Dream 39.8 19.1 184 4650 male
7 Adelie Dream 44.1 19.7 196 4400 male
8 Adelie Dream 39.6 18.8 190 4600 male
9 Adelie Dream 42.3 21.2 191 4150 male
10 Adelie Biscoe 40.1 18.9 188 4300 male
11 Adelie Biscoe 42.0 19.5 200 4050 male
12 Adelie Biscoe 41.3 21.1 195 4400 male
13 Adelie Biscoe 41.1 18.2 192 4050 male
23 Adelie Dream 40.3 18.5 196 4350 male
24 Adelie Dream 43.2 18.5 192 4100 male
25 Adelie Biscoe 41.0 20.0 203 4725 male
26 Adelie Biscoe 37.8 20.0 190 4250 male
27 Adelie Biscoe 43.2 19.0 197 4775 male
28 Adelie Biscoe 45.6 20.3 191 4600 male
29 Adelie Biscoe 42.2 19.5 197 4275 male
30 Adelie Biscoe 42.7 18.3 196 4075 male
31 Adelie Torgersen 41.5 18.3 195 4300 male
32 Adelie Dream 37.5 18.5 199 4475 male
33 Adelie Dream 39.7 17.9 193 4250 male
34 Adelie Dream 39.2 18.6 190 4250 male
subset(
    penguins
    , [:species, :sex, :body_mass_g] => 
    (x, y, z) -> (x .== "Adelie") .& (y .== "male") .& (z .> 4000)
    ,skipmissing=true
)
34×7 DataFrame
9 rows omitted
Row species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
String15 String15 Float64? Float64? Int64? Int64? String7?
1 Adelie Torgersen 39.2 19.6 195 4675 male
2 Adelie Torgersen 34.6 21.1 198 4400 male
3 Adelie Torgersen 42.5 20.7 197 4500 male
4 Adelie Torgersen 46.0 21.5 194 4200 male
5 Adelie Dream 39.2 21.1 196 4150 male
6 Adelie Dream 39.8 19.1 184 4650 male
7 Adelie Dream 44.1 19.7 196 4400 male
8 Adelie Dream 39.6 18.8 190 4600 male
9 Adelie Dream 42.3 21.2 191 4150 male
10 Adelie Biscoe 40.1 18.9 188 4300 male
11 Adelie Biscoe 42.0 19.5 200 4050 male
12 Adelie Biscoe 41.3 21.1 195 4400 male
13 Adelie Biscoe 41.1 18.2 192 4050 male
23 Adelie Dream 40.3 18.5 196 4350 male
24 Adelie Dream 43.2 18.5 192 4100 male
25 Adelie Biscoe 41.0 20.0 203 4725 male
26 Adelie Biscoe 37.8 20.0 190 4250 male
27 Adelie Biscoe 43.2 19.0 197 4775 male
28 Adelie Biscoe 45.6 20.3 191 4600 male
29 Adelie Biscoe 42.2 19.5 197 4275 male
30 Adelie Biscoe 42.7 18.3 196 4075 male
31 Adelie Torgersen 41.5 18.3 195 4300 male
32 Adelie Dream 37.5 18.5 199 4475 male
33 Adelie Dream 39.7 17.9 193 4250 male
34 Adelie Dream 39.2 18.6 190 4250 male

Problem: Filtering all the rows with species == “Adelie” OR sex == “male”.

@filter penguins (species == "Adelie") | (sex == "male")
247×7 DataFrame
222 rows omitted
Row species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
String15 String15 Float64? Float64? Int64? Int64? String7?
1 Adelie Torgersen 39.1 18.7 181 3750 male
2 Adelie Torgersen 39.5 17.4 186 3800 female
3 Adelie Torgersen 40.3 18.0 195 3250 female
4 Adelie Torgersen missing missing missing missing missing
5 Adelie Torgersen 36.7 19.3 193 3450 female
6 Adelie Torgersen 39.3 20.6 190 3650 male
7 Adelie Torgersen 38.9 17.8 181 3625 female
8 Adelie Torgersen 39.2 19.6 195 4675 male
9 Adelie Torgersen 34.1 18.1 193 3475 missing
10 Adelie Torgersen 42.0 20.2 190 4250 missing
11 Adelie Torgersen 37.8 17.1 186 3300 missing
12 Adelie Torgersen 37.8 17.3 180 3700 missing
13 Adelie Torgersen 41.1 17.6 182 3200 female
236 Chinstrap Dream 50.8 18.5 201 4450 male
237 Chinstrap Dream 49.0 19.6 212 4300 male
238 Chinstrap Dream 51.5 18.7 187 3250 male
239 Chinstrap Dream 51.4 19.0 201 3950 male
240 Chinstrap Dream 50.7 19.7 203 4050 male
241 Chinstrap Dream 52.2 18.8 197 3450 male
242 Chinstrap Dream 49.3 19.9 203 4050 male
243 Chinstrap Dream 50.2 18.8 202 3800 male
244 Chinstrap Dream 51.9 19.5 206 3950 male
245 Chinstrap Dream 55.8 19.8 207 4000 male
246 Chinstrap Dream 49.6 18.2 193 3775 male
247 Chinstrap Dream 50.8 19.0 210 4100 male
DFM.@rsubset penguins (:species == "Adelie") | (:sex == "male")
247×7 DataFrame
222 rows omitted
Row species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
String15 String15 Float64? Float64? Int64? Int64? String7?
1 Adelie Torgersen 39.1 18.7 181 3750 male
2 Adelie Torgersen 39.5 17.4 186 3800 female
3 Adelie Torgersen 40.3 18.0 195 3250 female
4 Adelie Torgersen missing missing missing missing missing
5 Adelie Torgersen 36.7 19.3 193 3450 female
6 Adelie Torgersen 39.3 20.6 190 3650 male
7 Adelie Torgersen 38.9 17.8 181 3625 female
8 Adelie Torgersen 39.2 19.6 195 4675 male
9 Adelie Torgersen 34.1 18.1 193 3475 missing
10 Adelie Torgersen 42.0 20.2 190 4250 missing
11 Adelie Torgersen 37.8 17.1 186 3300 missing
12 Adelie Torgersen 37.8 17.3 180 3700 missing
13 Adelie Torgersen 41.1 17.6 182 3200 female
236 Chinstrap Dream 50.8 18.5 201 4450 male
237 Chinstrap Dream 49.0 19.6 212 4300 male
238 Chinstrap Dream 51.5 18.7 187 3250 male
239 Chinstrap Dream 51.4 19.0 201 3950 male
240 Chinstrap Dream 50.7 19.7 203 4050 male
241 Chinstrap Dream 52.2 18.8 197 3450 male
242 Chinstrap Dream 49.3 19.9 203 4050 male
243 Chinstrap Dream 50.2 18.8 202 3800 male
244 Chinstrap Dream 51.9 19.5 206 3950 male
245 Chinstrap Dream 55.8 19.8 207 4000 male
246 Chinstrap Dream 49.6 18.2 193 3775 male
247 Chinstrap Dream 50.8 19.0 210 4100 male
subset(penguins, [:species, :sex] => (x, y) -> (x .== "Adelie") .| (y .== "male"), skipmissing=true)
247×7 DataFrame
222 rows omitted
Row species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
String15 String15 Float64? Float64? Int64? Int64? String7?
1 Adelie Torgersen 39.1 18.7 181 3750 male
2 Adelie Torgersen 39.5 17.4 186 3800 female
3 Adelie Torgersen 40.3 18.0 195 3250 female
4 Adelie Torgersen missing missing missing missing missing
5 Adelie Torgersen 36.7 19.3 193 3450 female
6 Adelie Torgersen 39.3 20.6 190 3650 male
7 Adelie Torgersen 38.9 17.8 181 3625 female
8 Adelie Torgersen 39.2 19.6 195 4675 male
9 Adelie Torgersen 34.1 18.1 193 3475 missing
10 Adelie Torgersen 42.0 20.2 190 4250 missing
11 Adelie Torgersen 37.8 17.1 186 3300 missing
12 Adelie Torgersen 37.8 17.3 180 3700 missing
13 Adelie Torgersen 41.1 17.6 182 3200 female
236 Chinstrap Dream 50.8 18.5 201 4450 male
237 Chinstrap Dream 49.0 19.6 212 4300 male
238 Chinstrap Dream 51.5 18.7 187 3250 male
239 Chinstrap Dream 51.4 19.0 201 3950 male
240 Chinstrap Dream 50.7 19.7 203 4050 male
241 Chinstrap Dream 52.2 18.8 197 3450 male
242 Chinstrap Dream 49.3 19.9 203 4050 male
243 Chinstrap Dream 50.2 18.8 202 3800 male
244 Chinstrap Dream 51.9 19.5 206 3950 male
245 Chinstrap Dream 55.8 19.8 207 4000 male
246 Chinstrap Dream 49.6 18.2 193 3775 male
247 Chinstrap Dream 50.8 19.0 210 4100 male

1.1.3 Filtering with metadata

By metadata here we mean data that is inside the dataframe, as the mean/max/min of a column.

Problem: Filtering all the rows where the flipper_length_mm is greater than the mean.

@filter penguins flipper_length_mm > mean(skipmissing(flipper_length_mm))
148×7 DataFrame
123 rows omitted
Row species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
String15 String15 Float64? Float64? Int64? Int64? String7?
1 Adelie Dream 35.7 18.0 202 3550 female
2 Adelie Dream 41.1 18.1 205 4300 male
3 Adelie Dream 40.8 18.9 208 4300 male
4 Adelie Biscoe 41.0 20.0 203 4725 male
5 Adelie Torgersen 41.4 18.5 202 3875 male
6 Adelie Torgersen 44.1 18.0 210 4000 male
7 Adelie Dream 41.5 18.5 201 4000 male
8 Gentoo Biscoe 46.1 13.2 211 4500 female
9 Gentoo Biscoe 50.0 16.3 230 5700 male
10 Gentoo Biscoe 48.7 14.1 210 4450 female
11 Gentoo Biscoe 50.0 15.2 218 5700 male
12 Gentoo Biscoe 47.6 14.5 215 5400 male
13 Gentoo Biscoe 46.5 13.5 210 4550 female
137 Chinstrap Dream 53.5 19.9 205 4500 male
138 Chinstrap Dream 49.0 19.5 210 3950 male
139 Chinstrap Dream 50.8 18.5 201 4450 male
140 Chinstrap Dream 49.0 19.6 212 4300 male
141 Chinstrap Dream 51.4 19.0 201 3950 male
142 Chinstrap Dream 50.7 19.7 203 4050 male
143 Chinstrap Dream 49.3 19.9 203 4050 male
144 Chinstrap Dream 50.2 18.8 202 3800 male
145 Chinstrap Dream 51.9 19.5 206 3950 male
146 Chinstrap Dream 55.8 19.8 207 4000 male
147 Chinstrap Dream 43.5 18.1 202 3400 female
148 Chinstrap Dream 50.8 19.0 210 4100 male
DFM.@subset penguins :flipper_length_mm .>= mean(skipmissing(:flipper_length_mm))
148×7 DataFrame
123 rows omitted
Row species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
String15 String15 Float64? Float64? Int64? Int64? String7?
1 Adelie Dream 35.7 18.0 202 3550 female
2 Adelie Dream 41.1 18.1 205 4300 male
3 Adelie Dream 40.8 18.9 208 4300 male
4 Adelie Biscoe 41.0 20.0 203 4725 male
5 Adelie Torgersen 41.4 18.5 202 3875 male
6 Adelie Torgersen 44.1 18.0 210 4000 male
7 Adelie Dream 41.5 18.5 201 4000 male
8 Gentoo Biscoe 46.1 13.2 211 4500 female
9 Gentoo Biscoe 50.0 16.3 230 5700 male
10 Gentoo Biscoe 48.7 14.1 210 4450 female
11 Gentoo Biscoe 50.0 15.2 218 5700 male
12 Gentoo Biscoe 47.6 14.5 215 5400 male
13 Gentoo Biscoe 46.5 13.5 210 4550 female
137 Chinstrap Dream 53.5 19.9 205 4500 male
138 Chinstrap Dream 49.0 19.5 210 3950 male
139 Chinstrap Dream 50.8 18.5 201 4450 male
140 Chinstrap Dream 49.0 19.6 212 4300 male
141 Chinstrap Dream 51.4 19.0 201 3950 male
142 Chinstrap Dream 50.7 19.7 203 4050 male
143 Chinstrap Dream 49.3 19.9 203 4050 male
144 Chinstrap Dream 50.2 18.8 202 3800 male
145 Chinstrap Dream 51.9 19.5 206 3950 male
146 Chinstrap Dream 55.8 19.8 207 4000 male
147 Chinstrap Dream 43.5 18.1 202 3400 female
148 Chinstrap Dream 50.8 19.0 210 4100 male
subset(penguins, :flipper_length_mm => x -> x .> mean(skipmissing(x)), skipmissing=true)
148×7 DataFrame
123 rows omitted
Row species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
String15 String15 Float64? Float64? Int64? Int64? String7?
1 Adelie Dream 35.7 18.0 202 3550 female
2 Adelie Dream 41.1 18.1 205 4300 male
3 Adelie Dream 40.8 18.9 208 4300 male
4 Adelie Biscoe 41.0 20.0 203 4725 male
5 Adelie Torgersen 41.4 18.5 202 3875 male
6 Adelie Torgersen 44.1 18.0 210 4000 male
7 Adelie Dream 41.5 18.5 201 4000 male
8 Gentoo Biscoe 46.1 13.2 211 4500 female
9 Gentoo Biscoe 50.0 16.3 230 5700 male
10 Gentoo Biscoe 48.7 14.1 210 4450 female
11 Gentoo Biscoe 50.0 15.2 218 5700 male
12 Gentoo Biscoe 47.6 14.5 215 5400 male
13 Gentoo Biscoe 46.5 13.5 210 4550 female
137 Chinstrap Dream 53.5 19.9 205 4500 male
138 Chinstrap Dream 49.0 19.5 210 3950 male
139 Chinstrap Dream 50.8 18.5 201 4450 male
140 Chinstrap Dream 49.0 19.6 212 4300 male
141 Chinstrap Dream 51.4 19.0 201 3950 male
142 Chinstrap Dream 50.7 19.7 203 4050 male
143 Chinstrap Dream 49.3 19.9 203 4050 male
144 Chinstrap Dream 50.2 18.8 202 3800 male
145 Chinstrap Dream 51.9 19.5 206 3950 male
146 Chinstrap Dream 55.8 19.8 207 4000 male
147 Chinstrap Dream 43.5 18.1 202 3400 female
148 Chinstrap Dream 50.8 19.0 210 4100 male

1.1.4 Filtering with a variable column name

Suppose the column you want to filter is a variable, let’s say a symbol

my_column = :species;

Problem: Filtering all the rows where the column stored in my_column is “Adelie”.

@eval @filter penguins $my_column == "Adelie"
152×7 DataFrame
127 rows omitted
Row species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
String15 String15 Float64? Float64? Int64? Int64? String7?
1 Adelie Torgersen 39.1 18.7 181 3750 male
2 Adelie Torgersen 39.5 17.4 186 3800 female
3 Adelie Torgersen 40.3 18.0 195 3250 female
4 Adelie Torgersen missing missing missing missing missing
5 Adelie Torgersen 36.7 19.3 193 3450 female
6 Adelie Torgersen 39.3 20.6 190 3650 male
7 Adelie Torgersen 38.9 17.8 181 3625 female
8 Adelie Torgersen 39.2 19.6 195 4675 male
9 Adelie Torgersen 34.1 18.1 193 3475 missing
10 Adelie Torgersen 42.0 20.2 190 4250 missing
11 Adelie Torgersen 37.8 17.1 186 3300 missing
12 Adelie Torgersen 37.8 17.3 180 3700 missing
13 Adelie Torgersen 41.1 17.6 182 3200 female
141 Adelie Dream 40.2 17.1 193 3400 female
142 Adelie Dream 40.6 17.2 187 3475 male
143 Adelie Dream 32.1 15.5 188 3050 female
144 Adelie Dream 40.7 17.0 190 3725 male
145 Adelie Dream 37.3 16.8 192 3000 female
146 Adelie Dream 39.0 18.7 185 3650 male
147 Adelie Dream 39.2 18.6 190 4250 male
148 Adelie Dream 36.6 18.4 184 3475 female
149 Adelie Dream 36.0 17.8 195 3450 female
150 Adelie Dream 37.8 18.1 193 3750 male
151 Adelie Dream 36.0 17.1 187 3700 female
152 Adelie Dream 41.5 18.5 201 4000 male
DFM.@rsubset penguins $my_column == "Adelie"
152×7 DataFrame
127 rows omitted
Row species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
String15 String15 Float64? Float64? Int64? Int64? String7?
1 Adelie Torgersen 39.1 18.7 181 3750 male
2 Adelie Torgersen 39.5 17.4 186 3800 female
3 Adelie Torgersen 40.3 18.0 195 3250 female
4 Adelie Torgersen missing missing missing missing missing
5 Adelie Torgersen 36.7 19.3 193 3450 female
6 Adelie Torgersen 39.3 20.6 190 3650 male
7 Adelie Torgersen 38.9 17.8 181 3625 female
8 Adelie Torgersen 39.2 19.6 195 4675 male
9 Adelie Torgersen 34.1 18.1 193 3475 missing
10 Adelie Torgersen 42.0 20.2 190 4250 missing
11 Adelie Torgersen 37.8 17.1 186 3300 missing
12 Adelie Torgersen 37.8 17.3 180 3700 missing
13 Adelie Torgersen 41.1 17.6 182 3200 female
141 Adelie Dream 40.2 17.1 193 3400 female
142 Adelie Dream 40.6 17.2 187 3475 male
143 Adelie Dream 32.1 15.5 188 3050 female
144 Adelie Dream 40.7 17.0 190 3725 male
145 Adelie Dream 37.3 16.8 192 3000 female
146 Adelie Dream 39.0 18.7 185 3650 male
147 Adelie Dream 39.2 18.6 190 4250 male
148 Adelie Dream 36.6 18.4 184 3475 female
149 Adelie Dream 36.0 17.8 195 3450 female
150 Adelie Dream 37.8 18.1 193 3750 male
151 Adelie Dream 36.0 17.1 187 3700 female
152 Adelie Dream 41.5 18.5 201 4000 male
subset(penguins, my_column => x -> x .== "Adelie")
152×7 DataFrame
127 rows omitted
Row species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
String15 String15 Float64? Float64? Int64? Int64? String7?
1 Adelie Torgersen 39.1 18.7 181 3750 male
2 Adelie Torgersen 39.5 17.4 186 3800 female
3 Adelie Torgersen 40.3 18.0 195 3250 female
4 Adelie Torgersen missing missing missing missing missing
5 Adelie Torgersen 36.7 19.3 193 3450 female
6 Adelie Torgersen 39.3 20.6 190 3650 male
7 Adelie Torgersen 38.9 17.8 181 3625 female
8 Adelie Torgersen 39.2 19.6 195 4675 male
9 Adelie Torgersen 34.1 18.1 193 3475 missing
10 Adelie Torgersen 42.0 20.2 190 4250 missing
11 Adelie Torgersen 37.8 17.1 186 3300 missing
12 Adelie Torgersen 37.8 17.3 180 3700 missing
13 Adelie Torgersen 41.1 17.6 182 3200 female
141 Adelie Dream 40.2 17.1 193 3400 female
142 Adelie Dream 40.6 17.2 187 3475 male
143 Adelie Dream 32.1 15.5 188 3050 female
144 Adelie Dream 40.7 17.0 190 3725 male
145 Adelie Dream 37.3 16.8 192 3000 female
146 Adelie Dream 39.0 18.7 185 3650 male
147 Adelie Dream 39.2 18.6 190 4250 male
148 Adelie Dream 36.6 18.4 184 3475 female
149 Adelie Dream 36.0 17.8 195 3450 female
150 Adelie Dream 37.8 18.1 193 3750 male
151 Adelie Dream 36.0 17.1 187 3700 female
152 Adelie Dream 41.5 18.5 201 4000 male

In case the column is a string

my_column_string = "species";

instead of a symbol, we can write in the same way, just taking care in Tidier to convert it to a symbol

@eval @filter penguins $(Symbol(my_column_string)) == "Adelie"
152×7 DataFrame
127 rows omitted
Row species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
String15 String15 Float64? Float64? Int64? Int64? String7?
1 Adelie Torgersen 39.1 18.7 181 3750 male
2 Adelie Torgersen 39.5 17.4 186 3800 female
3 Adelie Torgersen 40.3 18.0 195 3250 female
4 Adelie Torgersen missing missing missing missing missing
5 Adelie Torgersen 36.7 19.3 193 3450 female
6 Adelie Torgersen 39.3 20.6 190 3650 male
7 Adelie Torgersen 38.9 17.8 181 3625 female
8 Adelie Torgersen 39.2 19.6 195 4675 male
9 Adelie Torgersen 34.1 18.1 193 3475 missing
10 Adelie Torgersen 42.0 20.2 190 4250 missing
11 Adelie Torgersen 37.8 17.1 186 3300 missing
12 Adelie Torgersen 37.8 17.3 180 3700 missing
13 Adelie Torgersen 41.1 17.6 182 3200 female
141 Adelie Dream 40.2 17.1 193 3400 female
142 Adelie Dream 40.6 17.2 187 3475 male
143 Adelie Dream 32.1 15.5 188 3050 female
144 Adelie Dream 40.7 17.0 190 3725 male
145 Adelie Dream 37.3 16.8 192 3000 female
146 Adelie Dream 39.0 18.7 185 3650 male
147 Adelie Dream 39.2 18.6 190 4250 male
148 Adelie Dream 36.6 18.4 184 3475 female
149 Adelie Dream 36.0 17.8 195 3450 female
150 Adelie Dream 37.8 18.1 193 3750 male
151 Adelie Dream 36.0 17.1 187 3700 female
152 Adelie Dream 41.5 18.5 201 4000 male
DFM.@rsubset penguins $(my_column_string) == "Adelie"
152×7 DataFrame
127 rows omitted
Row species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
String15 String15 Float64? Float64? Int64? Int64? String7?
1 Adelie Torgersen 39.1 18.7 181 3750 male
2 Adelie Torgersen 39.5 17.4 186 3800 female
3 Adelie Torgersen 40.3 18.0 195 3250 female
4 Adelie Torgersen missing missing missing missing missing
5 Adelie Torgersen 36.7 19.3 193 3450 female
6 Adelie Torgersen 39.3 20.6 190 3650 male
7 Adelie Torgersen 38.9 17.8 181 3625 female
8 Adelie Torgersen 39.2 19.6 195 4675 male
9 Adelie Torgersen 34.1 18.1 193 3475 missing
10 Adelie Torgersen 42.0 20.2 190 4250 missing
11 Adelie Torgersen 37.8 17.1 186 3300 missing
12 Adelie Torgersen 37.8 17.3 180 3700 missing
13 Adelie Torgersen 41.1 17.6 182 3200 female
141 Adelie Dream 40.2 17.1 193 3400 female
142 Adelie Dream 40.6 17.2 187 3475 male
143 Adelie Dream 32.1 15.5 188 3050 female
144 Adelie Dream 40.7 17.0 190 3725 male
145 Adelie Dream 37.3 16.8 192 3000 female
146 Adelie Dream 39.0 18.7 185 3650 male
147 Adelie Dream 39.2 18.6 190 4250 male
148 Adelie Dream 36.6 18.4 184 3475 female
149 Adelie Dream 36.0 17.8 195 3450 female
150 Adelie Dream 37.8 18.1 193 3750 male
151 Adelie Dream 36.0 17.1 187 3700 female
152 Adelie Dream 41.5 18.5 201 4000 male
subset(penguins, my_column_string => x -> x .== "Adelie")
152×7 DataFrame
127 rows omitted
Row species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
String15 String15 Float64? Float64? Int64? Int64? String7?
1 Adelie Torgersen 39.1 18.7 181 3750 male
2 Adelie Torgersen 39.5 17.4 186 3800 female
3 Adelie Torgersen 40.3 18.0 195 3250 female
4 Adelie Torgersen missing missing missing missing missing
5 Adelie Torgersen 36.7 19.3 193 3450 female
6 Adelie Torgersen 39.3 20.6 190 3650 male
7 Adelie Torgersen 38.9 17.8 181 3625 female
8 Adelie Torgersen 39.2 19.6 195 4675 male
9 Adelie Torgersen 34.1 18.1 193 3475 missing
10 Adelie Torgersen 42.0 20.2 190 4250 missing
11 Adelie Torgersen 37.8 17.1 186 3300 missing
12 Adelie Torgersen 37.8 17.3 180 3700 missing
13 Adelie Torgersen 41.1 17.6 182 3200 female
141 Adelie Dream 40.2 17.1 193 3400 female
142 Adelie Dream 40.6 17.2 187 3475 male
143 Adelie Dream 32.1 15.5 188 3050 female
144 Adelie Dream 40.7 17.0 190 3725 male
145 Adelie Dream 37.3 16.8 192 3000 female
146 Adelie Dream 39.0 18.7 185 3650 male
147 Adelie Dream 39.2 18.6 190 4250 male
148 Adelie Dream 36.6 18.4 184 3475 female
149 Adelie Dream 36.0 17.8 195 3450 female
150 Adelie Dream 37.8 18.1 193 3750 male
151 Adelie Dream 36.0 17.1 187 3700 female
152 Adelie Dream 41.5 18.5 201 4000 male

1.2 Arranging

To arrange a dataframe means to reorder the rows according to the order of some columns. The rows are first arranged by the first column, then by the second (if any), and so on. In Tidier, when we want to invert the ordering, just put the column name inside a desc() call.

1.2.1 Arranging by one column

Problem: Arrange by body_mass_g.

@arrange penguins body_mass_g
344×7 DataFrame
319 rows omitted
Row species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
String15 String15 Float64? Float64? Int64? Int64? String7?
1 Chinstrap Dream 46.9 16.6 192 2700 female
2 Adelie Biscoe 36.5 16.6 181 2850 female
3 Adelie Biscoe 36.4 17.1 184 2850 female
4 Adelie Biscoe 34.5 18.1 187 2900 female
5 Adelie Dream 33.1 16.1 178 2900 female
6 Adelie Torgersen 38.6 17.0 188 2900 female
7 Chinstrap Dream 43.2 16.6 187 2900 female
8 Adelie Biscoe 37.9 18.6 193 2925 female
9 Adelie Dream 37.5 18.9 179 2975 missing
10 Adelie Dream 37.0 16.9 185 3000 female
11 Adelie Dream 37.3 16.8 192 3000 female
12 Adelie Torgersen 35.9 16.6 190 3050 female
13 Adelie Torgersen 35.2 15.9 186 3050 female
333 Gentoo Biscoe 48.6 16.0 230 5800 male
334 Gentoo Biscoe 48.4 14.6 213 5850 male
335 Gentoo Biscoe 49.3 15.7 217 5850 male
336 Gentoo Biscoe 55.1 16.0 230 5850 male
337 Gentoo Biscoe 45.2 16.4 223 5950 male
338 Gentoo Biscoe 49.8 15.9 229 5950 male
339 Gentoo Biscoe 51.1 16.3 220 6000 male
340 Gentoo Biscoe 48.8 16.2 222 6000 male
341 Gentoo Biscoe 59.6 17.0 230 6050 male
342 Gentoo Biscoe 49.2 15.2 221 6300 male
343 Adelie Torgersen missing missing missing missing missing
344 Gentoo Biscoe missing missing missing missing missing
DFM.@orderby penguins :body_mass_g
344×7 DataFrame
319 rows omitted
Row species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
String15 String15 Float64? Float64? Int64? Int64? String7?
1 Chinstrap Dream 46.9 16.6 192 2700 female
2 Adelie Biscoe 36.5 16.6 181 2850 female
3 Adelie Biscoe 36.4 17.1 184 2850 female
4 Adelie Biscoe 34.5 18.1 187 2900 female
5 Adelie Dream 33.1 16.1 178 2900 female
6 Adelie Torgersen 38.6 17.0 188 2900 female
7 Chinstrap Dream 43.2 16.6 187 2900 female
8 Adelie Biscoe 37.9 18.6 193 2925 female
9 Adelie Dream 37.5 18.9 179 2975 missing
10 Adelie Dream 37.0 16.9 185 3000 female
11 Adelie Dream 37.3 16.8 192 3000 female
12 Adelie Torgersen 35.9 16.6 190 3050 female
13 Adelie Torgersen 35.2 15.9 186 3050 female
333 Gentoo Biscoe 48.6 16.0 230 5800 male
334 Gentoo Biscoe 48.4 14.6 213 5850 male
335 Gentoo Biscoe 49.3 15.7 217 5850 male
336 Gentoo Biscoe 55.1 16.0 230 5850 male
337 Gentoo Biscoe 45.2 16.4 223 5950 male
338 Gentoo Biscoe 49.8 15.9 229 5950 male
339 Gentoo Biscoe 51.1 16.3 220 6000 male
340 Gentoo Biscoe 48.8 16.2 222 6000 male
341 Gentoo Biscoe 59.6 17.0 230 6050 male
342 Gentoo Biscoe 49.2 15.2 221 6300 male
343 Adelie Torgersen missing missing missing missing missing
344 Gentoo Biscoe missing missing missing missing missing
sort(penguins, :body_mass_g)
344×7 DataFrame
319 rows omitted
Row species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
String15 String15 Float64? Float64? Int64? Int64? String7?
1 Chinstrap Dream 46.9 16.6 192 2700 female
2 Adelie Biscoe 36.5 16.6 181 2850 female
3 Adelie Biscoe 36.4 17.1 184 2850 female
4 Adelie Biscoe 34.5 18.1 187 2900 female
5 Adelie Dream 33.1 16.1 178 2900 female
6 Adelie Torgersen 38.6 17.0 188 2900 female
7 Chinstrap Dream 43.2 16.6 187 2900 female
8 Adelie Biscoe 37.9 18.6 193 2925 female
9 Adelie Dream 37.5 18.9 179 2975 missing
10 Adelie Dream 37.0 16.9 185 3000 female
11 Adelie Dream 37.3 16.8 192 3000 female
12 Adelie Torgersen 35.9 16.6 190 3050 female
13 Adelie Torgersen 35.2 15.9 186 3050 female
333 Gentoo Biscoe 48.6 16.0 230 5800 male
334 Gentoo Biscoe 48.4 14.6 213 5850 male
335 Gentoo Biscoe 49.3 15.7 217 5850 male
336 Gentoo Biscoe 55.1 16.0 230 5850 male
337 Gentoo Biscoe 45.2 16.4 223 5950 male
338 Gentoo Biscoe 49.8 15.9 229 5950 male
339 Gentoo Biscoe 51.1 16.3 220 6000 male
340 Gentoo Biscoe 48.8 16.2 222 6000 male
341 Gentoo Biscoe 59.6 17.0 230 6050 male
342 Gentoo Biscoe 49.2 15.2 221 6300 male
343 Adelie Torgersen missing missing missing missing missing
344 Gentoo Biscoe missing missing missing missing missing

1.2.2 Arranging by two columns, with one reversed

Problem: First arrange by island, then by reversed body_mass_g.

@arrange penguins island desc(body_mass_g)
344×7 DataFrame
319 rows omitted
Row species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
String15 String15 Float64? Float64? Int64? Int64? String7?
1 Gentoo Biscoe missing missing missing missing missing
2 Gentoo Biscoe 49.2 15.2 221 6300 male
3 Gentoo Biscoe 59.6 17.0 230 6050 male
4 Gentoo Biscoe 51.1 16.3 220 6000 male
5 Gentoo Biscoe 48.8 16.2 222 6000 male
6 Gentoo Biscoe 45.2 16.4 223 5950 male
7 Gentoo Biscoe 49.8 15.9 229 5950 male
8 Gentoo Biscoe 48.4 14.6 213 5850 male
9 Gentoo Biscoe 49.3 15.7 217 5850 male
10 Gentoo Biscoe 55.1 16.0 230 5850 male
11 Gentoo Biscoe 49.5 16.2 229 5800 male
12 Gentoo Biscoe 48.6 16.0 230 5800 male
13 Gentoo Biscoe 50.4 15.7 222 5750 male
333 Adelie Torgersen 41.1 18.6 189 3325 male
334 Adelie Torgersen 38.5 17.9 190 3325 female
335 Adelie Torgersen 37.8 17.1 186 3300 missing
336 Adelie Torgersen 38.8 17.6 191 3275 female
337 Adelie Torgersen 40.3 18.0 195 3250 female
338 Adelie Torgersen 41.1 17.6 182 3200 female
339 Adelie Torgersen 34.6 17.2 189 3200 female
340 Adelie Torgersen 36.2 17.2 187 3150 female
341 Adelie Torgersen 35.9 16.6 190 3050 female
342 Adelie Torgersen 35.2 15.9 186 3050 female
343 Adelie Torgersen 39.0 17.1 191 3050 female
344 Adelie Torgersen 38.6 17.0 188 2900 female
# works only when the reversed column is numeric?

DFM.@orderby penguins :island :body_mass_g .* -1
344×7 DataFrame
319 rows omitted
Row species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
String15 String15 Float64? Float64? Int64? Int64? String7?
1 Gentoo Biscoe 49.2 15.2 221 6300 male
2 Gentoo Biscoe 59.6 17.0 230 6050 male
3 Gentoo Biscoe 51.1 16.3 220 6000 male
4 Gentoo Biscoe 48.8 16.2 222 6000 male
5 Gentoo Biscoe 45.2 16.4 223 5950 male
6 Gentoo Biscoe 49.8 15.9 229 5950 male
7 Gentoo Biscoe 48.4 14.6 213 5850 male
8 Gentoo Biscoe 49.3 15.7 217 5850 male
9 Gentoo Biscoe 55.1 16.0 230 5850 male
10 Gentoo Biscoe 49.5 16.2 229 5800 male
11 Gentoo Biscoe 48.6 16.0 230 5800 male
12 Gentoo Biscoe 50.4 15.7 222 5750 male
13 Gentoo Biscoe 50.0 16.3 230 5700 male
333 Adelie Torgersen 38.5 17.9 190 3325 female
334 Adelie Torgersen 37.8 17.1 186 3300 missing
335 Adelie Torgersen 38.8 17.6 191 3275 female
336 Adelie Torgersen 40.3 18.0 195 3250 female
337 Adelie Torgersen 41.1 17.6 182 3200 female
338 Adelie Torgersen 34.6 17.2 189 3200 female
339 Adelie Torgersen 36.2 17.2 187 3150 female
340 Adelie Torgersen 35.9 16.6 190 3050 female
341 Adelie Torgersen 35.2 15.9 186 3050 female
342 Adelie Torgersen 39.0 17.1 191 3050 female
343 Adelie Torgersen 38.6 17.0 188 2900 female
344 Adelie Torgersen missing missing missing missing missing
sort(penguins, [order(:island), order(:body_mass_g, rev=true)])
344×7 DataFrame
319 rows omitted
Row species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
String15 String15 Float64? Float64? Int64? Int64? String7?
1 Gentoo Biscoe missing missing missing missing missing
2 Gentoo Biscoe 49.2 15.2 221 6300 male
3 Gentoo Biscoe 59.6 17.0 230 6050 male
4 Gentoo Biscoe 51.1 16.3 220 6000 male
5 Gentoo Biscoe 48.8 16.2 222 6000 male
6 Gentoo Biscoe 45.2 16.4 223 5950 male
7 Gentoo Biscoe 49.8 15.9 229 5950 male
8 Gentoo Biscoe 48.4 14.6 213 5850 male
9 Gentoo Biscoe 49.3 15.7 217 5850 male
10 Gentoo Biscoe 55.1 16.0 230 5850 male
11 Gentoo Biscoe 49.5 16.2 229 5800 male
12 Gentoo Biscoe 48.6 16.0 230 5800 male
13 Gentoo Biscoe 50.4 15.7 222 5750 male
333 Adelie Torgersen 41.1 18.6 189 3325 male
334 Adelie Torgersen 38.5 17.9 190 3325 female
335 Adelie Torgersen 37.8 17.1 186 3300 missing
336 Adelie Torgersen 38.8 17.6 191 3275 female
337 Adelie Torgersen 40.3 18.0 195 3250 female
338 Adelie Torgersen 41.1 17.6 182 3200 female
339 Adelie Torgersen 34.6 17.2 189 3200 female
340 Adelie Torgersen 36.2 17.2 187 3150 female
341 Adelie Torgersen 35.9 16.6 190 3050 female
342 Adelie Torgersen 35.2 15.9 186 3050 female
343 Adelie Torgersen 39.0 17.1 191 3050 female
344 Adelie Torgersen 38.6 17.0 188 2900 female

1.2.3 Arranging by one variable column

Problem: Arrange by a column stored in a variable my_arrange_column.

my_arrange_column = :body_mass_g;
@eval @arrange penguins $my_arrange_column
344×7 DataFrame
319 rows omitted
Row species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
String15 String15 Float64? Float64? Int64? Int64? String7?
1 Chinstrap Dream 46.9 16.6 192 2700 female
2 Adelie Biscoe 36.5 16.6 181 2850 female
3 Adelie Biscoe 36.4 17.1 184 2850 female
4 Adelie Biscoe 34.5 18.1 187 2900 female
5 Adelie Dream 33.1 16.1 178 2900 female
6 Adelie Torgersen 38.6 17.0 188 2900 female
7 Chinstrap Dream 43.2 16.6 187 2900 female
8 Adelie Biscoe 37.9 18.6 193 2925 female
9 Adelie Dream 37.5 18.9 179 2975 missing
10 Adelie Dream 37.0 16.9 185 3000 female
11 Adelie Dream 37.3 16.8 192 3000 female
12 Adelie Torgersen 35.9 16.6 190 3050 female
13 Adelie Torgersen 35.2 15.9 186 3050 female
333 Gentoo Biscoe 48.6 16.0 230 5800 male
334 Gentoo Biscoe 48.4 14.6 213 5850 male
335 Gentoo Biscoe 49.3 15.7 217 5850 male
336 Gentoo Biscoe 55.1 16.0 230 5850 male
337 Gentoo Biscoe 45.2 16.4 223 5950 male
338 Gentoo Biscoe 49.8 15.9 229 5950 male
339 Gentoo Biscoe 51.1 16.3 220 6000 male
340 Gentoo Biscoe 48.8 16.2 222 6000 male
341 Gentoo Biscoe 59.6 17.0 230 6050 male
342 Gentoo Biscoe 49.2 15.2 221 6300 male
343 Adelie Torgersen missing missing missing missing missing
344 Gentoo Biscoe missing missing missing missing missing
DFM.@orderby penguins $my_arrange_column
344×7 DataFrame
319 rows omitted
Row species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
String15 String15 Float64? Float64? Int64? Int64? String7?
1 Chinstrap Dream 46.9 16.6 192 2700 female
2 Adelie Biscoe 36.5 16.6 181 2850 female
3 Adelie Biscoe 36.4 17.1 184 2850 female
4 Adelie Biscoe 34.5 18.1 187 2900 female
5 Adelie Dream 33.1 16.1 178 2900 female
6 Adelie Torgersen 38.6 17.0 188 2900 female
7 Chinstrap Dream 43.2 16.6 187 2900 female
8 Adelie Biscoe 37.9 18.6 193 2925 female
9 Adelie Dream 37.5 18.9 179 2975 missing
10 Adelie Dream 37.0 16.9 185 3000 female
11 Adelie Dream 37.3 16.8 192 3000 female
12 Adelie Torgersen 35.9 16.6 190 3050 female
13 Adelie Torgersen 35.2 15.9 186 3050 female
333 Gentoo Biscoe 48.6 16.0 230 5800 male
334 Gentoo Biscoe 48.4 14.6 213 5850 male
335 Gentoo Biscoe 49.3 15.7 217 5850 male
336 Gentoo Biscoe 55.1 16.0 230 5850 male
337 Gentoo Biscoe 45.2 16.4 223 5950 male
338 Gentoo Biscoe 49.8 15.9 229 5950 male
339 Gentoo Biscoe 51.1 16.3 220 6000 male
340 Gentoo Biscoe 48.8 16.2 222 6000 male
341 Gentoo Biscoe 59.6 17.0 230 6050 male
342 Gentoo Biscoe 49.2 15.2 221 6300 male
343 Adelie Torgersen missing missing missing missing missing
344 Gentoo Biscoe missing missing missing missing missing
sort(penguins, my_arrange_column)
344×7 DataFrame
319 rows omitted
Row species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
String15 String15 Float64? Float64? Int64? Int64? String7?
1 Chinstrap Dream 46.9 16.6 192 2700 female
2 Adelie Biscoe 36.5 16.6 181 2850 female
3 Adelie Biscoe 36.4 17.1 184 2850 female
4 Adelie Biscoe 34.5 18.1 187 2900 female
5 Adelie Dream 33.1 16.1 178 2900 female
6 Adelie Torgersen 38.6 17.0 188 2900 female
7 Chinstrap Dream 43.2 16.6 187 2900 female
8 Adelie Biscoe 37.9 18.6 193 2925 female
9 Adelie Dream 37.5 18.9 179 2975 missing
10 Adelie Dream 37.0 16.9 185 3000 female
11 Adelie Dream 37.3 16.8 192 3000 female
12 Adelie Torgersen 35.9 16.6 190 3050 female
13 Adelie Torgersen 35.2 15.9 186 3050 female
333 Gentoo Biscoe 48.6 16.0 230 5800 male
334 Gentoo Biscoe 48.4 14.6 213 5850 male
335 Gentoo Biscoe 49.3 15.7 217 5850 male
336 Gentoo Biscoe 55.1 16.0 230 5850 male
337 Gentoo Biscoe 45.2 16.4 223 5950 male
338 Gentoo Biscoe 49.8 15.9 229 5950 male
339 Gentoo Biscoe 51.1 16.3 220 6000 male
340 Gentoo Biscoe 48.8 16.2 222 6000 male
341 Gentoo Biscoe 59.6 17.0 230 6050 male
342 Gentoo Biscoe 49.2 15.2 221 6300 male
343 Adelie Torgersen missing missing missing missing missing
344 Gentoo Biscoe missing missing missing missing missing