All Transformers

Summary Table

Transformer	Brief Description
Standardizer	Transforming columns of numerical features by standardization
UnivariateBoxCoxTransformer	Apply BoxCox transformation given a single vector
InteractionTransformer	Transforming columns of numerical features to create new interaction features
UnivariateDiscretizer	Discretize a continuous vector into an ordered factor
FillImputer	Fill missing values of features belonging to any scientific type
UnivariateTimeTypeToContinuous	Transform a vector of time type into continuous type
UnivariateFillImputer	Fill in missing values in a single vector
OneHotEncoder	Encode categorical variables into one-hot vectors
ContinuousEncoder	Adds type casting functionality to OnehotEncoder
OrdinalEncoder	Encode categorical variables into ordered integers
FrequencyEncoder	Encode categorical variables into their normalized or unormalized frequencies
TargetEncoder	Encode categorical variables into relevant target statistics
DummyEncoder	Encodes by comparing each level to the reference level, intercept being the cell mean of the reference group
SumEncoder	Encodes by comparing each level to the reference level, intercept being the grand mean
HelmertEncoder	Encodes by comparing levels of a variable with the mean of the subsequent levels of the variable
ForwardDifferenceEncoder	Encodes by comparing adjacent levels of a variable (each level minus the next level)
ContrastEncoder	Allows defining a custom contrast encoder via a contrast matrix
HypothesisEncoder	Allows defining a custom contrast encoder via a hypothesis matrix
EntityEmbedder	Encode categorical variables into dense embedding vectors
CardinalityReducer	Reduce cardinality of high cardinality categorical features by grouping infrequent categories
MissingnessEncoder	Encode missing values of categorical features into new values

MLJTransforms.Standardizer — Type

Standardizer

A model type for constructing a standardizer, based on MLJTransforms.jl, and implementing the MLJ model interface.

From MLJ, the type can be imported using

Standardizer = @load Standardizer pkg=MLJTransforms

Do model = Standardizer() to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in Standardizer(features=...).

Use this model to standardize (whiten) a Continuous vector, or relevant columns of a table. The rescalings applied by this transformer to new data are always those learned during the training phase, which are generally different from what would actually standardize the new data.

Training data

In MLJ or MLJBase, bind an instance model to data with

mach = machine(model, X)

where

X: any Tables.jl compatible table or any abstract vector with Continuous element scitype (any abstract float vector). Only features in a table with Continuous scitype can be standardized; check column scitypes with schema(X).

Train the machine using fit!(mach, rows=...).

Hyper-parameters

features: one of the following, with the behavior indicated below:
- [] (empty, the default): standardize all features (columns) having Continuous element scitype
- non-empty vector of feature names (symbols): standardize only the Continuous features in the vector (if ignore=false) or Continuous features not named in the vector (ignore=true).
- function or other callable: standardize a feature if the callable returns true on its name. For example, Standardizer(features = name -> name in [:x1, :x3], ignore = true, count=true) has the same effect as Standardizer(features = [:x1, :x3], ignore = true, count=true), namely to standardize all Continuous and Count features, with the exception of :x1 and :x3.
Note this behavior is further modified if the ordered_factor or count flags are set to true; see below
ignore=false: whether to ignore or standardize specified features, as explained above
ordered_factor=false: if true, standardize any OrderedFactor feature wherever a Continuous feature would be standardized, as described above
count=false: if true, standardize any Count feature wherever a Continuous feature would be standardized, as described above

Operations

transform(mach, Xnew): return Xnew with relevant features standardized according to the rescalings learned during fitting of mach.
inverse_transform(mach, Z): apply the inverse transformation to Z, so that inverse_transform(mach, transform(mach, Xnew)) is approximately the same as Xnew; unavailable if ordered_factor or count flags were set to true.

Fitted parameters

The fields of fitted_params(mach) are:

features_fit - the names of features that will be standardized
means - the corresponding untransformed mean values
stds - the corresponding untransformed standard deviations

Report

The fields of report(mach) are:

features_fit: the names of features that will be standardized

Examples

using MLJ

X = (ordinal1 = [1, 2, 3],
     ordinal2 = coerce([:x, :y, :x], OrderedFactor),
     ordinal3 = [10.0, 20.0, 30.0],
     ordinal4 = [-20.0, -30.0, -40.0],
     nominal = coerce(["Your father", "he", "is"], Multiclass));

julia> schema(X)
┌──────────┬──────────────────┐
│ names    │ scitypes         │
├──────────┼──────────────────┤
│ ordinal1 │ Count            │
│ ordinal2 │ OrderedFactor{2} │
│ ordinal3 │ Continuous       │
│ ordinal4 │ Continuous       │
│ nominal  │ Multiclass{3}    │
└──────────┴──────────────────┘

stand1 = Standardizer();

julia> transform(fit!(machine(stand1, X)), X)
(ordinal1 = [1, 2, 3],
 ordinal2 = CategoricalValue{Symbol,UInt32}[:x, :y, :x],
 ordinal3 = [-1.0, 0.0, 1.0],
 ordinal4 = [1.0, 0.0, -1.0],
 nominal = CategoricalValue{String,UInt32}["Your father", "he", "is"],)

stand2 = Standardizer(features=[:ordinal3, ], ignore=true, count=true);

julia> transform(fit!(machine(stand2, X)), X)
(ordinal1 = [-1.0, 0.0, 1.0],
 ordinal2 = CategoricalValue{Symbol,UInt32}[:x, :y, :x],
 ordinal3 = [10.0, 20.0, 30.0],
 ordinal4 = [1.0, 0.0, -1.0],
 nominal = CategoricalValue{String,UInt32}["Your father", "he", "is"],)

source

MLJTransforms.UnivariateStandardizer — Type

UnivariateStandardizer()

Transformer type for standardizing (whitening) single variable data.

This model may be deprecated in the future. Consider using Standardizer, which handles both tabular and univariate data.

source

MLJTransforms.UnivariateBoxCoxTransformer — Type

UnivariateBoxCoxTransformer

A model type for constructing a single variable Box-Cox transformer, based on MLJTransforms.jl, and implementing the MLJ model interface.

From MLJ, the type can be imported using

UnivariateBoxCoxTransformer = @load UnivariateBoxCoxTransformer pkg=MLJTransforms

Do model = UnivariateBoxCoxTransformer() to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in UnivariateBoxCoxTransformer(n=...).

Box-Cox transformations attempt to make data look more normally distributed. This can improve performance and assist in the interpretation of models which suppose that data is generated by a normal distribution.

A Box-Cox transformation (with shift) is of the form

x -> ((x + c)^λ - 1)/λ

for some constant c and real λ, unless λ = 0, in which case the above is replaced with

x -> log(x + c)

Given user-specified hyper-parameters n::Integer and shift::Bool, the present implementation learns the parameters c and λ from the training data as follows: If shift=true and zeros are encountered in the data, then c is set to 0.2 times the data mean. If there are no zeros, then no shift is applied. Finally, n different values of λ between -0.4 and 3 are considered, with λ fixed to the value maximizing normality of the transformed data.

Reference: Wikipedia entry for power transform.

Training data

In MLJ or MLJBase, bind an instance model to data with

mach = machine(model, x)

where

x: any abstract vector with element scitype Continuous; check the scitype with scitype(x)

Train the machine using fit!(mach, rows=...).

Hyper-parameters

n=171: number of values of the exponent λ to try
shift=false: whether to include a preliminary constant translation in transformations, in the presence of zeros

Operations

transform(mach, xnew): apply the Box-Cox transformation learned when fitting mach
inverse_transform(mach, z): reconstruct the vector z whose transformation learned by mach is z

Fitted parameters

The fields of fitted_params(mach) are:

λ: the learned Box-Cox exponent
c: the learned shift

Examples

using MLJ
using UnicodePlots
using Random
Random.seed!(123)

transf = UnivariateBoxCoxTransformer()

x = randn(1000).^2

mach = machine(transf, x)
fit!(mach)

z = transform(mach, x)

julia> histogram(x)
                ┌                                        ┐
   [ 0.0,  2.0) ┤███████████████████████████████████  848
   [ 2.0,  4.0) ┤████▌ 109
   [ 4.0,  6.0) ┤█▍ 33
   [ 6.0,  8.0) ┤▍ 7
   [ 8.0, 10.0) ┤▏ 2
   [10.0, 12.0) ┤  0
   [12.0, 14.0) ┤▏ 1
                └                                        ┘
                                 Frequency

julia> histogram(z)
                ┌                                        ┐
   [-5.0, -4.0) ┤█▎ 8
   [-4.0, -3.0) ┤████████▊ 64
   [-3.0, -2.0) ┤█████████████████████▊ 159
   [-2.0, -1.0) ┤█████████████████████████████▊ 216
   [-1.0,  0.0) ┤███████████████████████████████████  254
   [ 0.0,  1.0) ┤█████████████████████████▊ 188
   [ 1.0,  2.0) ┤████████████▍ 90
   [ 2.0,  3.0) ┤██▊ 20
   [ 3.0,  4.0) ┤▎ 1
                └                                        ┘
                                 Frequency

source

MLJTransforms.InteractionTransformer — Type

InteractionTransformer

A model type for constructing a interaction transformer, based on MLJTransforms.jl, and implementing the MLJ model interface.

From MLJ, the type can be imported using

InteractionTransformer = @load InteractionTransformer pkg=MLJTransforms

Do model = InteractionTransformer() to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in InteractionTransformer(order=...).

Generates all polynomial interaction terms up to the given order for the subset of chosen columns. Any column that contains elements with scitype <:Infinite is a valid basis to generate interactions. If features is not specified, all such columns with scitype <:Infinite in the table are used as a basis.

In MLJ or MLJBase, you can transform features X with the single call

transform(machine(model), X)