2  Operations on columns

using DataFrames, PalmerPenguins
using Tidier
import DataFramesMeta as DFM

penguins = PalmerPenguins.load() |> DataFrame;
@slice_head(penguins, n = 10)
10×7 DataFrame
Row species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
String15 String15 Float64? Float64? Int64? Int64? String7?
1 Adelie Torgersen 39.1 18.7 181 3750 male
2 Adelie Torgersen 39.5 17.4 186 3800 female
3 Adelie Torgersen 40.3 18.0 195 3250 female
4 Adelie Torgersen missing missing missing missing missing
5 Adelie Torgersen 36.7 19.3 193 3450 female
6 Adelie Torgersen 39.3 20.6 190 3650 male
7 Adelie Torgersen 38.9 17.8 181 3625 female
8 Adelie Torgersen 39.2 19.6 195 4675 male
9 Adelie Torgersen 34.1 18.1 193 3475 missing
10 Adelie Torgersen 42.0 20.2 190 4250 missing

2.1 Selecting (or: throwing columns away)

2.1.1 Selecting n columns

Problem: Select only some columns.

@select penguins species body_mass_g
344×2 DataFrame
319 rows omitted
Row species body_mass_g
String15 Int64?
1 Adelie 3750
2 Adelie 3800
3 Adelie 3250
4 Adelie missing
5 Adelie 3450
6 Adelie 3650
7 Adelie 3625
8 Adelie 4675
9 Adelie 3475
10 Adelie 4250
11 Adelie 3300
12 Adelie 3700
13 Adelie 3200
333 Chinstrap 3250
334 Chinstrap 4050
335 Chinstrap 3800
336 Chinstrap 3525
337 Chinstrap 3950
338 Chinstrap 3650
339 Chinstrap 3650
340 Chinstrap 4000
341 Chinstrap 3400
342 Chinstrap 3775
343 Chinstrap 4100
344 Chinstrap 3775
DFM.@select penguins :species :body_mass_g
344×2 DataFrame
319 rows omitted
Row species body_mass_g
String15 Int64?
1 Adelie 3750
2 Adelie 3800
3 Adelie 3250
4 Adelie missing
5 Adelie 3450
6 Adelie 3650
7 Adelie 3625
8 Adelie 4675
9 Adelie 3475
10 Adelie 4250
11 Adelie 3300
12 Adelie 3700
13 Adelie 3200
333 Chinstrap 3250
334 Chinstrap 4050
335 Chinstrap 3800
336 Chinstrap 3525
337 Chinstrap 3950
338 Chinstrap 3650
339 Chinstrap 3650
340 Chinstrap 4000
341 Chinstrap 3400
342 Chinstrap 3775
343 Chinstrap 4100
344 Chinstrap 3775
DFM.select(penguins, [:species, :body_mass_g])
344×2 DataFrame
319 rows omitted
Row species body_mass_g
String15 Int64?
1 Adelie 3750
2 Adelie 3800
3 Adelie 3250
4 Adelie missing
5 Adelie 3450
6 Adelie 3650
7 Adelie 3625
8 Adelie 4675
9 Adelie 3475
10 Adelie 4250
11 Adelie 3300
12 Adelie 3700
13 Adelie 3200
333 Chinstrap 3250
334 Chinstrap 4050
335 Chinstrap 3800
336 Chinstrap 3525
337 Chinstrap 3950
338 Chinstrap 3650
339 Chinstrap 3650
340 Chinstrap 4000
341 Chinstrap 3400
342 Chinstrap 3775
343 Chinstrap 4100
344 Chinstrap 3775

2.1.2 Selecting columns from a variable

Problem: Select only some columns whose names are stored in a variable.

@eval @select penguins $my_columns...
344×2 DataFrame
319 rows omitted
Row species body_mass_g
String15 Int64?
1 Adelie 3750
2 Adelie 3800
3 Adelie 3250
4 Adelie missing
5 Adelie 3450
6 Adelie 3650
7 Adelie 3625
8 Adelie 4675
9 Adelie 3475
10 Adelie 4250
11 Adelie 3300
12 Adelie 3700
13 Adelie 3200
333 Chinstrap 3250
334 Chinstrap 4050
335 Chinstrap 3800
336 Chinstrap 3525
337 Chinstrap 3950
338 Chinstrap 3650
339 Chinstrap 3650
340 Chinstrap 4000
341 Chinstrap 3400
342 Chinstrap 3775
343 Chinstrap 4100
344 Chinstrap 3775
DFM.@select penguins $my_columns
344×2 DataFrame
319 rows omitted
Row species body_mass_g
String15 Int64?
1 Adelie 3750
2 Adelie 3800
3 Adelie 3250
4 Adelie missing
5 Adelie 3450
6 Adelie 3650
7 Adelie 3625
8 Adelie 4675
9 Adelie 3475
10 Adelie 4250
11 Adelie 3300
12 Adelie 3700
13 Adelie 3200
333 Chinstrap 3250
334 Chinstrap 4050
335 Chinstrap 3800
336 Chinstrap 3525
337 Chinstrap 3950
338 Chinstrap 3650
339 Chinstrap 3650
340 Chinstrap 4000
341 Chinstrap 3400
342 Chinstrap 3775
343 Chinstrap 4100
344 Chinstrap 3775
DFM.select(penguins, my_columns)
344×2 DataFrame
319 rows omitted
Row species body_mass_g
String15 Int64?
1 Adelie 3750
2 Adelie 3800
3 Adelie 3250
4 Adelie missing
5 Adelie 3450
6 Adelie 3650
7 Adelie 3625
8 Adelie 4675
9 Adelie 3475
10 Adelie 4250
11 Adelie 3300
12 Adelie 3700
13 Adelie 3200
333 Chinstrap 3250
334 Chinstrap 4050
335 Chinstrap 3800
336 Chinstrap 3525
337 Chinstrap 3950
338 Chinstrap 3650
339 Chinstrap 3650
340 Chinstrap 4000
341 Chinstrap 3400
342 Chinstrap 3775
343 Chinstrap 4100
344 Chinstrap 3775

2.2 Mutating (or: creating columns)

2.2.1 Creating one column based on another one

Problem: Create the column body_mass_kg by dividing body_mass_g by 1000.

@mutate penguins body_mass_kg = body_mass_g / 1000
344×8 DataFrame
319 rows omitted
Row species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex body_mass_kg
String15 String15 Float64? Float64? Int64? Int64? String7? Float64?
1 Adelie Torgersen 39.1 18.7 181 3750 male 3.75
2 Adelie Torgersen 39.5 17.4 186 3800 female 3.8
3 Adelie Torgersen 40.3 18.0 195 3250 female 3.25
4 Adelie Torgersen missing missing missing missing missing missing
5 Adelie Torgersen 36.7 19.3 193 3450 female 3.45
6 Adelie Torgersen 39.3 20.6 190 3650 male 3.65
7 Adelie Torgersen 38.9 17.8 181 3625 female 3.625
8 Adelie Torgersen 39.2 19.6 195 4675 male 4.675
9 Adelie Torgersen 34.1 18.1 193 3475 missing 3.475
10 Adelie Torgersen 42.0 20.2 190 4250 missing 4.25
11 Adelie Torgersen 37.8 17.1 186 3300 missing 3.3
12 Adelie Torgersen 37.8 17.3 180 3700 missing 3.7
13 Adelie Torgersen 41.1 17.6 182 3200 female 3.2
333 Chinstrap Dream 45.2 16.6 191 3250 female 3.25
334 Chinstrap Dream 49.3 19.9 203 4050 male 4.05
335 Chinstrap Dream 50.2 18.8 202 3800 male 3.8
336 Chinstrap Dream 45.6 19.4 194 3525 female 3.525
337 Chinstrap Dream 51.9 19.5 206 3950 male 3.95
338 Chinstrap Dream 46.8 16.5 189 3650 female 3.65
339 Chinstrap Dream 45.7 17.0 195 3650 female 3.65
340 Chinstrap Dream 55.8 19.8 207 4000 male 4.0
341 Chinstrap Dream 43.5 18.1 202 3400 female 3.4
342 Chinstrap Dream 49.6 18.2 193 3775 male 3.775
343 Chinstrap Dream 50.8 19.0 210 4100 male 4.1
344 Chinstrap Dream 50.2 18.7 198 3775 female 3.775
DFM.@rtransform penguins :body_mass_kg = :body_mass_g / 1000
344×8 DataFrame
319 rows omitted
Row species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex body_mass_kg
String15 String15 Float64? Float64? Int64? Int64? String7? Float64?
1 Adelie Torgersen 39.1 18.7 181 3750 male 3.75
2 Adelie Torgersen 39.5 17.4 186 3800 female 3.8
3 Adelie Torgersen 40.3 18.0 195 3250 female 3.25
4 Adelie Torgersen missing missing missing missing missing missing
5 Adelie Torgersen 36.7 19.3 193 3450 female 3.45
6 Adelie Torgersen 39.3 20.6 190 3650 male 3.65
7 Adelie Torgersen 38.9 17.8 181 3625 female 3.625
8 Adelie Torgersen 39.2 19.6 195 4675 male 4.675
9 Adelie Torgersen 34.1 18.1 193 3475 missing 3.475
10 Adelie Torgersen 42.0 20.2 190 4250 missing 4.25
11 Adelie Torgersen 37.8 17.1 186 3300 missing 3.3
12 Adelie Torgersen 37.8 17.3 180 3700 missing 3.7
13 Adelie Torgersen 41.1 17.6 182 3200 female 3.2
333 Chinstrap Dream 45.2 16.6 191 3250 female 3.25
334 Chinstrap Dream 49.3 19.9 203 4050 male 4.05
335 Chinstrap Dream 50.2 18.8 202 3800 male 3.8
336 Chinstrap Dream 45.6 19.4 194 3525 female 3.525
337 Chinstrap Dream 51.9 19.5 206 3950 male 3.95
338 Chinstrap Dream 46.8 16.5 189 3650 female 3.65
339 Chinstrap Dream 45.7 17.0 195 3650 female 3.65
340 Chinstrap Dream 55.8 19.8 207 4000 male 4.0
341 Chinstrap Dream 43.5 18.1 202 3400 female 3.4
342 Chinstrap Dream 49.6 18.2 193 3775 male 3.775
343 Chinstrap Dream 50.8 19.0 210 4100 male 4.1
344 Chinstrap Dream 50.2 18.7 198 3775 female 3.775
penguins2 = copy(penguins);
penguins.body_mass_kg = penguins.body_mass_g ./ 1000;
penguins2
344×7 DataFrame
319 rows omitted
Row species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
String15 String15 Float64? Float64? Int64? Int64? String7?
1 Adelie Torgersen 39.1 18.7 181 3750 male
2 Adelie Torgersen 39.5 17.4 186 3800 female
3 Adelie Torgersen 40.3 18.0 195 3250 female
4 Adelie Torgersen missing missing missing missing missing
5 Adelie Torgersen 36.7 19.3 193 3450 female
6 Adelie Torgersen 39.3 20.6 190 3650 male
7 Adelie Torgersen 38.9 17.8 181 3625 female
8 Adelie Torgersen 39.2 19.6 195 4675 male
9 Adelie Torgersen 34.1 18.1 193 3475 missing
10 Adelie Torgersen 42.0 20.2 190 4250 missing
11 Adelie Torgersen 37.8 17.1 186 3300 missing
12 Adelie Torgersen 37.8 17.3 180 3700 missing
13 Adelie Torgersen 41.1 17.6 182 3200 female
333 Chinstrap Dream 45.2 16.6 191 3250 female
334 Chinstrap Dream 49.3 19.9 203 4050 male
335 Chinstrap Dream 50.2 18.8 202 3800 male
336 Chinstrap Dream 45.6 19.4 194 3525 female
337 Chinstrap Dream 51.9 19.5 206 3950 male
338 Chinstrap Dream 46.8 16.5 189 3650 female
339 Chinstrap Dream 45.7 17.0 195 3650 female
340 Chinstrap Dream 55.8 19.8 207 4000 male
341 Chinstrap Dream 43.5 18.1 202 3400 female
342 Chinstrap Dream 49.6 18.2 193 3775 male
343 Chinstrap Dream 50.8 19.0 210 4100 male
344 Chinstrap Dream 50.2 18.7 198 3775 female

2.3 Conditionally mutating columns