Skip to content

Reference

Index¤

Reference - Exported functions¤

# TidierCats.as_categoricalMethod.

as_categorical(arr::AbstractArray)

Converts the input array to a CategoricalArray.

Arguments

arr: Input array.

Returns

CategoricalArray constructed from the input array.

Examples

julia> arr = ["A", "B", "C", "A", "B", "B", "D", "E", missing]
9-element Vector{Union{Missing, String}}:
 "A"
 "B"
 "C"
 "A"
 "B"
 "B"
 "D"
 "E"
 missing

julia> as_categorical(arr)
9-element CategoricalArrays.CategoricalArray{Union{Missing, String},1,UInt32}:
 "A"
 "B"
 "C"
 "A"
 "B"
 "B"
 "D"
 "E"
 missing

source

# TidierCats.as_integerMethod.

Converts a CategoricalValue or CategoricalArray to an integer or vector of integers.

source

# TidierCats.cat_collapseMethod.

catcollapse(catarray::CategoricalArray, levels_map::Dict)

Collapses levels in a categorical variable column based on a provided mapping.

Arguments

cat_array: Categorical variable column to collapse. levels_map: A dictionary with the original levels as keys and the new levels as values. Levels not in the keys will be kept the same.

Returns

Categorical array with the levels collapsed.

Examples

julia> cat_array = CategoricalArray(["A", "B", "C", "D", "E"], ordered=true)
5-element CategoricalArrays.CategoricalArray{String,1,UInt32}:
 "A"
 "B"
 "C"
 "D"
 "E"

julia> levels_map = Dict("A" => "A", "B" => "A", "C" => "C", "D" => "C", "E" => "E");

julia> cat_collapse(cat_array, levels_map)
5-element CategoricalArrays.CategoricalArray{String,1,UInt32}:
 "A"
 "A"
 "C"
 "C"
 "E"

source

# TidierCats.cat_expandMethod.

cat_expand(cat_array::CategoricalArray, new_levels...; after=Inf)

Expands the levels in a categorical array by adding new levels at a specified position.

Arguments

  • cat_array: Categorical array to expand levels
  • new_levels: New levels to be added to the categorical array
  • after: Position after which to insert the new levels. Default is Inf, which appends the new levels at the end.

Returns

Categorical array with the new levels added.

Examples

julia> cats = CategoricalArray(["a", "b", "c", "a", "c", "b"]);

julia> println("Original levels: ", levels(cats))
Original levels: ["a", "b", "c"]

julia> cats = cat_expand(f, "d", "e", "f");

julia> println("Expanded levels: ", levels(cats))
Expanded levels: ["a", "b", "c", "d", "e", "f"]

source

# TidierCats.cat_infreqMethod.

cat_infreq(cat_array)

Orders the levels in a categorical array by their frequency, with the most common level first.

Arguments

cat_array: Input categorical array.

Returns

Categorical array with levels reordered by frequency.

Examples

julia> cat_array = CategoricalArray(["A", "B", "B"], ordered=true)
3-element CategoricalArrays.CategoricalArray{String,1,UInt32}:
 "A"
 "B"
 "B"

julia> cat_infreq(cat_array)
3-element CategoricalArrays.CategoricalArray{String,1,UInt32}:
 "A"
 "B"
 "B"

source

# TidierCats.cat_lumpMethod.

cat_lump(cat_array, n::Int)

Orders the levels in a categorical array by their frequency and keeps only the 'n' most common levels. All other levels are replaced by "Other".

Arguments cat_array: Input categorical array. n: Number of levels to keep as they are, the rest become "Other"

Returns

Categorical array with the least frequent levels lumped as "Other".

Examples

julia> cat_array = CategoricalArray(["A", "B", "C", "A", "B", "B", "D", "E", "F"], ordered=true)
9-element CategoricalArrays.CategoricalArray{String,1,UInt32}:
 "A"
 "B"
 "C"
 "A"
 "B"
 "B"
 "D"
 "E"
 "F"

source

# TidierCats.cat_lump_minFunction.

cat_lump_min(cat_array, min::Int, other_level::String = "Other")

Lumps infrequent levels in a categorical array into an 'other' level based on minimum count.

Arguments

  • cat_array: Categorical array to lump
  • min: Minimum count threshold. Levels with counts below this will be lumped.
  • other_level: The level name to lump infrequent levels into. Default is "Other".

Returns

Categorical array with levels lumped.

Examples

```jldoctest julia> cat_array = CategoricalArray(["A", "B", "B", "C", "C", "D"]) 6-element CategoricalArrays.CategoricalArray{String,1,UInt32}: "A" "B" "B" "C" "C" "D"

julia> catlumpmin(cat_array, 2) 6-element CategoricalArrays.CategoricalArray{String,1,UInt32}: "A" "B" "B" "Other" "Other" "Other" ```

source

# TidierCats.cat_lump_propFunction.

cat_lump_prop(cat_array, prop::Float64, other_level::String = "Other")

Lumps infrequent levels in a categorical array into an 'other' level based on proportion threshold.

Arguments

  • cat_array: Categorical array to lump
  • prop: Proportion threshold. Levels with proportions below this will be lumped.
  • other_level: The level name to lump infrequent levels into. Default is "Other".

Returns

Categorical array with levels lumped based on proportion.

Examples

```jldoctest julia> cat_array = CategoricalArray(["A", "B", "B", "C", "C", "D"]) 6-element CategoricalArrays.CategoricalArray{String,1,UInt32}: "A" "B" "B" "C" "C" "D"

julia> catlumpprop(cat_array, 0.3) 6-element CategoricalArrays.CategoricalArray{String,1,UInt32}: "A" "B" "B" "Other" "Other" "Other" ```

source

# TidierCats.cat_otherMethod.

cat_other(cat_array::CategoricalArray, other_level::String="Other")

Replaces all levels in a categorical array with the 'other' level.

Arguments

  • cat_array: Categorical array to replace levels
  • other_level: The level name to replace all levels with. Default is "Other".

Returns

Categorical array with all levels replaced by the 'other' level.

Examples

julia> cat_array = CategoricalArray(["A", "B", "C", "D", "E"]);

julia> cat_other(cat_array, drop = ["A", "B"])
5-element CategoricalArrays.CategoricalArray{String,1,UInt32}:
 "Other"
 "Other"
 "C"
 "D"
 "E"

source

# TidierCats.cat_recodeMethod.

cat_recode(cat_array::Union{CategoricalArray, AbstractVector}; kwargs...)

Recodes the levels in a categorical array based on a provided mapping.

Arguments

  • cat_array: Categorical array to recode
  • kwargs: A dictionary with the original levels as keys and the new levels as values. Levels not in the keys will be kept the same.

Returns

Categorical array with the levels recoded.

Examples

julia> x = CategoricalArray(["apple", "tomato", "banana", "dear"]);

julia> println(levels(cat_recode(x, fruit = ["apple", "banana"], nothing = ["tomato"])))
["fruit", "nothing", "dear"]

source

# TidierCats.cat_relevelMethod.

cat_relevel(cat_array::CategoricalArray, levels_order::Vector{String}, after::Int=0)

Reorders the levels in a categorical array according to the provided order.

Arguments

cat_array: Input categorical array. levels_order: Vector of levels in the desired order. after: Position after which to insert the new levels. Default is ignored

Returns

Categorical array with levels reordered according to levels_order.

Examples

julia> cat_array = CategoricalArray(["A", "B", "C", "A", "B", "B"], ordered=true)
6-element CategoricalArrays.CategoricalArray{String,1,UInt32}:
 "A"
 "B"
 "C"
 "A"
 "B"
 "B"

julia> println(levels(cat_relevel(cat_array, ["B", "A", "C"])))
["B", "A", "C"]

julia> println(levels(cat_relevel(cat_array, ["A"], after=1)))
["B", "A", "C"]

julia> cat_array = CategoricalArray(["A", "B", "C", "A", "B", missing], ordered=true);

julia> println(levels(cat_relevel(cat_array, ["C", "A", "B", missing]), skipmissing=false))
Union{Missing, String}["C", "A", "B", missing]

source

# TidierCats.cat_reorderFunction.

cat_reorder(cat_var::AbstractVector, order_var::AbstractVector, fun::String, desc::Bool=true)

Reorders the levels in a categorical variable column based on a summary statistic calculated from another variable.

Arguments cat_var: Categorical variable column to reorder. order_var: Variable to calculate the summary statistic from. fun: Function to calculate the summary statistic. Options are "mean" and "median". desc: If true, the levels are ordered in descending order of the summary statistic.

Returns

Categorical array with the levels reordered.

Examples

julia> cat_var = String["A", "B", "A", "B", "A", "B", "C", "C", "C"];
       order_var = [1, 2, 3, 4, 5, 6, 7, 8, 9];
       cat_reorder(cat_var, order_var, "mean")
9-element CategoricalArrays.CategoricalArray{String,1,UInt32}:
 "A"
 "B"
 "A"
 "B"
 "A"
 "B"
 "C"
 "C"
 "C"

source

# TidierCats.cat_replace_missingMethod.

cat_replace_missing(cat_array::CategoricalArray, missing_level::String="missing")

Lumps infrequent levels in a categorical array into an 'other' level based on proportion threshold.

Arguments

  • cat_array: Categorical array to lump
  • prop: Proportion threshold. Levels with proportions below this will be lumped.
  • other_level: The level name to lump infrequent levels into. Default is "Other".

Returns

Categorical array with levels lumped based on proportion.

Examples

```jldoctest julia> cat_array = CategoricalArray(["a", "b", missing, "a", missing, "c"]);

julia > print(catmissingtolvl(catarray)) 6-element CategoricalArray{Union{Missing, String},1,UInt32}: "a" "b" missing "a" missing "c"

julia> print(catmissingtolvl(catarray, "unknown")) 6-element CategoricalArray{Union{Missing, String},1,UInt32}: "a" "b" "unknown" "a" "unknown" "c" ```

source

# TidierCats.cat_revMethod.

cat_rev(cat_array)

Reverses the order of levels in a categorical array.

Arguments

cat_array: Input categorical array.

Returns

Categorical array with reversed order of levels.

Examples

julia> cat_array = CategoricalArray(["A", "B", "C", "A", "B", "B"], ordered=true)
6-element CategoricalArrays.CategoricalArray{String,1,UInt32}:
 "A"
 "B"
 "C"
 "A"
 "B"
 "B"

julia> cat_rev(cat_array)
6-element CategoricalArrays.CategoricalArray{String,1,UInt32}:
 "A"
 "B"
 "C"
 "A"
 "B"
 "B"

source

Reference - Internal functions¤