Getting Started
using TidierPlots
using DataFrames
using PalmerPenguins
penguins = dropmissing(DataFrame(PalmerPenguins.load()));
ggplot()
¤
ggplot()
is the starting point of any plot. It sets up the initial plot with default settings that can be later customized with geoms, scales, theme settings and other specifications. ggplot
usually used with a data source as an argument, and optionally, a set of aesthetics specified by @aes(). The data source is typically a DataFrame.
If a set of aesthetics is specified in the initial ggplot call, these aesthetics apply to all layers added to the plot, unless they are overridden in subsequent layers.
ggplot(penguins, @aes(x = bill_length_mm, y = bill_depth_mm, color = species)) +
geom_point()
@aes()
¤
aes()
is used to map variables in your data to visual properties (aesthetics) of the plot. These aesthetics can include things like position (x and y coordinates), color, shape, size, etc. Each aesthetic is a way of visualizing a variable or a statistical transformation of a variable.
Aesthetics are specified in the form aes(aesthetic = variable), where aesthetic is the name of the aesthetic, and variable is the column name in your data that you want to map to the aesthetic. The variable names do not need to be preceded by a colon.
Of note, TidierPlots.jl accepts multiple forms for aes specification, none of which is exactly the same as ggplot2.
- Option 1:
@aes
macro, aes as in ggplot, e.g.@aes(x = x, y = y)
- Option 2:
@es
macro, aes as in ggplot, e.g.@es(x = x, y = y)
- Option 3:
aes
function, julia-style columns, e.g.aes(x = :x, y = :y)
- Option 4:
aes
function, strings for columns, e.g.aes(x = "x", y = "y")
The examples below will generally use option 1
In the above example, we can see that the x, y, and color aesthetics are being mapped to each subsequent layer of the plot.
In general, aes()
can be used within the ggplot
to set global aesthetics that apply to all layers, or within individual geoms to set aesthetics that apply only to that layer.
Moving from general rules, to specific plots, let us first explore geom_point()
geom_point()
geom_point
is used to create a scatter plot. It is typically used with aesthetics mapping variables to x and y positions, and optionally to other aesthetics like color, shape, and size. geom_point
can be used to visualize the relationship between two continuous variables, or a continuous and a discrete variable. The following visuals features can be changed within geom_point()
, shape, size, stroke, strokecolour, and alpha .
ggplot(penguins, @aes(x = bill_length_mm, y = bill_depth_mm, color = species)) +
geom_point( size = 20,
stroke = 1,
strokecolor = "black",
alpha = 0.2) +
labs(x = "Bill Length (mm)", y = "Bill Width (mm)") +
lims(x = c(40, 60), y = c(15, 20)) +
theme_minimal()
In the example above, a scatter plot is created with the variable billlengthmm mapped to the x position, and billdepthmm mapped to the y position with color mapped to species. Supported optional arguements include:
- size - this is the size of the marker
- alpha (or transparency), is set to a value between 0 and 1.
- strokecolor is the stroke color around the marker. https://juliagraphics.github.io/Colors.jl/stable/namedcolors/ colors can be chosen from any name on this list
- stroke this is the thickeness of the stroke around the marker
lims
¤
lims
allows the user to set the ranges for the x and y axises as shown in the example above.
geom_bar
, geom_col
, and geom_histogram
¤
geom_bar
is used to create bar plots for categorical data. geom_col
is a special case of geom_bar
where the height of the bars is already computed and does not need to be counted. geom_histogram
is used to create a histogram, which is essentially a bar plot for continuous data, where the data is divided into bins and the number of data points in each bin is counted.
ggplot(data=penguins, @aes(x=species)) +
geom_bar(aes(color = "island"), position = "dodge")
geom_bar optional arguments include
- color, alpha as above
- position, when set to "dodge," bar charts will not stack
ggplot(data=penguins, @aes(x = island, y=species)) +
geom_col()
ggplot() +
geom_histogram(data=penguins, @aes(x = bill_length_mm))
In the first example, a bar plot is created with the variable CategoricalVar mapped to the x position, and the count of each category is represented by the height of the bars.
In the second example, a column plot is created with the variable CategoricalVar mapped to the x position, and ComputedHeight mapped to the y position.
A histogram is created with the continuous variable, billlengthmm, mapped to the x position, and the data is divided into bins, with the count in each bin represented by the height of the bars.
geom_path
and geom_line
¤
The geom_path
and geom_line
are used to create line plots. geom_path
connects the data points in the order they appear in the data, while geom_line
connects the data points in order of the x-values.
x_values = 1:10;
y_values = x_values .^ 2;
df_line = DataFrame(X = x_values, Y = y_values);
ggplot(df_line) +
geom_line(@aes(x = X, y = Y)) +
labs(title = "Line Plot Example", x = "X axis", y = "Y axis")+
theme_dark()
geom_step
¤
geom_step
creates a step plot, which is similar to a line plot but with a step pattern rather than a direct line from point to point.
ggplot(df_line, @aes(x = X, y = Y)) +
geom_step() +
labs(title = "Step Plot Example", x = "X axis", y = "Y axis")+
theme_minimal()
geom_boxplot
¤
geom_boxplot
creates a boxplot.
ggplot()+
geom_boxplot(data=penguins, @aes(x = island, y = bill_length_mm), alpha = .3)
In this example, a boxplot is created where different island of penguins are mapped to the x position, and the bill length is mapped to the y position. Finally, the each species will be mapped to a different color
geom_boxplot supported optinal arguements currently include:
- color - if used within the aes() with a categorical variable it will make each category a different color as shown above. When used outside of the aes() and selected with a color, it will make each boxplot that color.
- alpha - transaparency as above, used outside of the aes()
geom_violin
¤
geom_violin
creates a violin plot, which is a combination of a boxplot and a kernel density plot.
ggplot(penguins, @aes(x = species, y = bill_depth_mm)) +
geom_violin()
In this example, a violin plot is created where different species of penguins are mapped to the x position, and the bill depth is mapped to the y position. geom_violin does not currently support mapping a categorical variable to colors.
geom_tile
¤
The geom_tile
creates a tile plot, also known as a heatmap.
x_values = repeat(1:5, inner = 5);
y_values = repeat(1:5, outer = 5);
values = x_values .* y_values;
df_tile = DataFrame(X = x_values, Y = y_values, Value = values);
ggplot(df_tile, @aes(x = X, y = Y, z = Value)) +
geom_tile() +
labs(title = "Tile Plot Example", x = "X axis", y = "Y axis")
geom_text
and geom_label
¤
geom_text
and geom_label
are used to add text and labels to a plot.
Combining plots¤
Similar to the patchwork
library in R, plots can be combined horizontally using the +
and |
operators and vertically with the /
operator.
plot = ggplot(df_tile, @aes(x = X, y = Y, z = Value)) +
geom_tile();
plot | plot
plot / plot
These extend beyond two plots, but must be used in functional form for |
and /
.
plot + plot + plot
/(plot, plot, plot)
Grids of plots can also be combined using parethesis to delimit grid boundaries.
((plot + plot + plot) | plot) / (plot / plot)
scale_x_continuous
, scale_y_continuous
¤
scale_x_continuous
and scale_y_continuous
can apply labels and scales, reverse, or adjust the ticks for the axis.
ggplot(penguins, @aes(x = body_mass_g, y = bill_length_mm)) +
geom_point() +
scale_x_continuous(
name="Mass (g)",
trans=log10
) +
scale_y_continuous(
name="Length",
reversed = true,
labels=label_number(;suffix="mm")
)
The trans
argument takes as input a function which takes in and outputs numerical values. and labels
argument can be a string specification from Format.jl
or a function which formats a list of strings. Available label generators are:
label_bytes
label_currency
label_date
label_log
label_number
label_ordinal
label_percent
label_pvalue
label_scientific
label_wrap
Each of these will accept keywords arguments to generate a label function compatible with the labels
argument.
scale_x_log10
, scale_y_log10
¤
scale_x_log10
and scale_y_log10
apply a base 10 logarithmic transformation to the x and y axes, respectively.
ggplot(penguins, @aes(x = body_mass_g, y = bill_length_mm)) +
geom_point() +
scale_x_log10()
In this example, a scatter plot is created where the body mass of penguins is mapped to the x position and the bill length to the y position. A base 10 logarithmic transformation is then applied to the x-axis.
scale_x_log2
, scale_y_log2
, scale_x_log
, scale_y_log
¤
These work similarly to the previous ones, but apply a base 2 or base e logarithmic transformation to the x and y axes, respectively.
scale_x_logit
, scale_y_logit
¤
scale_x_logit
and `scaleylogit apply a logit transformation to the x and y axes, respectively. This transformation is often used when visualizing proportions or probabilities.
scale_x_pseudolog10
, scale_y_pseudolog10
, scale_x_Symlog10
, scale_y_Symlog10
¤
These apply different types of logarithmic transformations to the x and y axes. The "pseudo" and "Symlog" transformations are designed to handle zeros and negative values more effectively.
scale_x_reverse
, scale_y_reverse
¤
scale_x_reverse
and scale_y_reverse
reverse the direction of the x and y axes, respectively.
ggplot(penguins, @aes(x = body_mass_g, y = bill_length_mm, color = species)) +
geom_point() +
scale_y_reverse() +
theme_minimal()
In this example, a scatter plot is created where the body mass of penguins is mapped to the x position and the bill length to the y position. The direction of the y-axis is then reversed.
scale_x_sqrt
, scale_y_sqrt
¤
scale_x_sqrt
and scale_y_sqrt
apply a square root transformation to the x and y axes, respectively.
ggplot(penguins, @aes(x = body_mass_g, y = bill_length_mm, color = species)) +
geom_point() +
scale_x_sqrt() +
theme_minimal()
In this example, a scatter plot is created where the body mass of penguins is mapped to the x position and the bill length to the y position. A square root transformation is then applied to the x-axis
geom_errorbar
¤
geom_errorbar
creates vertical and error bars .
categories = ["A", "B", "C", "D"];
n = length(categories);
mean_values = rand(n); # Random mean values for demonstration
errors = rand(n) / 2; # Random error values for demonstration
LowerBound = mean_values .- errors;
UpperBound = mean_values .+ errors;
df_errorbar = DataFrame(
Category = categories,
MeanValue = mean_values,
LowerBound = LowerBound,
UpperBound = UpperBound);
ggplot(df_errorbar, @aes(x = Category, y = MeanValue, ymin = LowerBound, ymax = UpperBound)) +
geom_point() + # to show the mean value
geom_errorbar(width = 0.2) + # width of the horizontal line at the top and bottom of the error bar
labs(title = "Error Bar Plot Example", x = "Category", y = "Mean Value")
ggsave
¤
ggsave
saves a GGPlot to the specified location.
plot = ggplot(penguins, @aes(x = body_mass_g, y = bill_length_mm, color = species)) +
geom_point()
ggsave(plot, "penguin_points.png")
CairoMakie.Screen{IMAGE}
In this example, plot
is saved to penguin_points.png
. Acceptable filetypes are all those supported by CairoMakie: svg
, pdf
, and png
.
This page was generated using Literate.jl.