Skip to content

Pivoting

Pivoting a dataset is needed when information sitting inside of cell values needs to be converted into column names (to make the dataset wider) or vice verse (to make the dataset longer). Either action can be referred to as "reshaping" a dataset, and various frameworks refer to the actions as unstacking/stacking or spreading/gathering. In R's tidyverse, these actions are referred to as pivoting, where the two accompanying actions are @pivot_wider() and @pivot_longer().

@pivot_wider()¤

Pivoting a dataset to make it wider is needed when information sitting inside of cell values needs to be converted into column names. The wider format is sometimes required for the purposes of calculating correlations or running statistical tests.

Let's start with a "long" DataFrame and make it wide. Why would we want to make it wide? Well, if we wanted to calculate a correlation between A and B for rows with corresponding id numbers, we may need to first make sure that A and B are represented in adjacent columns.

using Tidier

df_long = DataFrame(id = [1, 1, 2, 2],
                    variable = ["A", "B", "A", "B"],
                    value = [1, 2, 3, 4])
4×3 DataFrame
Rowidvariablevalue
Int64StringInt64
11A1
21B2
32A3
42B4

To make this dataset wider, we can do the following:

@pivot_wider(df_long, names_from = variable, values_from = value)
2×3 DataFrame
RowidAB
Int64Int64?Int64?
1112
2234

In @pivot_wider(), both the names_from and values_from arguments are required.

@pivot_longer()¤

For calculating summary statistics (e.g., mean) by groups, or for plotting purposes, DataFrames often need to be converted to their longer form. For this, we can use @pivot_longer. First, let's start with a "wide" DataFrame.

df_wide = DataFrame(id = [1, 2], A = [1, 3], B = [2, 4])
2×3 DataFrame
RowidAB
Int64Int64Int64
1112
2234

Now, let's transform this wide dataset into the longer form. Unlike @pivot_wider(), where providing the names_from and values_from arguments is required, the only item that's required in @pivot_wider() is a set of columns to pivot. The names_to and values_to arguments are optional, and if not provided, they will default to "variable" and "value", respectively.

We can recreate the original long dataset by doing the following. Multiple columns must be provided using selection syntax or a selection helper. Tuples containing multiple columns are not yet supported.

@pivot_longer(df_wide, A:B)
4×3 DataFrame
Rowidvariablevalue
Int64StringInt64
11A1
22A3
31B2
42B4

Here is another way of providing the same result using a different type of selection syntax.

@pivot_longer(df_wide, -id)
4×3 DataFrame
Rowidvariablevalue
Int64StringInt64
11A1
22A3
31B2
42B4

In this example, we set the names_to and values_to arguments. Either argument can be left out and will revert to the default value. The names_to and values_to arguments must be bare unquoted variable names. Strings containing variable names are not yet supported.

@pivot_longer(df_wide, A:B, names_to = letter, values_to = number)
4×3 DataFrame
Rowidletternumber
Int64StringInt64
11A1
22A3
31B2
42B4

This page was generated using Literate.jl.