Transformers and other unsupervised models
Several unsupervised models used to perform common transformations, such as one-hot encoding, are available in MLJ out-of-the-box. These are detailed in Built-in transformers below.
A transformer is static if it has no learned parameters. While such a transformer is tantamount to an ordinary function, realizing it as an MLJ static transformer (subtype of Static <: Unsupervised) can be useful, especially if the function depends on parameters the user would like to manipulate (which become hyper-parameters of the model). The necessary syntax for defining your own static transformers is described in Static transformers below.
Some unsupervised models, such as clustering algorithms, have a predict method in addition to a transform method. We give an example of this in Transformers that also predict
Finally we note that models that fit a distribution, or more generally a sampler object, to some data, which are sometimes viewed as unsupervised, are treated in MLJ as supervised models. See Models that learn a probability distribution for an example.
Built-in transformers
MLJModels.UnivariateStandardizer — TypeUnivariateStandardizer()Unsupervised model for standardizing (whitening) univariate data.
MLJModels.Standardizer — TypeStandardizer(; features=Symbol[],
ignore=false,
ordered_factor=false,
count=false)Unsupervised model for standardizing (whitening) the columns of tabular data. If features is unspecified then all columns having Continuous element scitype are standardized. Otherwise, the features standardized are the Continuous features named in features (ignore=false) or Continuous features not named in features (ignore=true). To allow standarization of Count or OrderedFactor features as well, set the appropriate flag to true.
Instead of supplying a features vector, a Bool-valued callable with one argument can be also be specified. For example, specifying Standardizer(features = name -> name in [:x1, :x3], ignore = true, count=true) has the same effect as Standardizer(features = [:x1, :x3], ignore = true, count=true), namely to standardise all Continuous and Count features, with the exception of :x1 and :x3.
The inverse_tranform method is supported provided count=false and ordered_factor=false at time of fit.
Example
X = (ordinal1 = [1, 2, 3],
ordinal2 = coerce([:x, :y, :x], OrderedFactor),
ordinal3 = [10.0, 20.0, 30.0],
ordinal4 = [-20.0, -30.0, -40.0],
nominal = coerce(["Your father", "he", "is"], Multiclass));
stand1 = Standardizer();
julia> transform(fit!(machine(stand1, X)), X)
[ Info: Training Machine{Standardizer} @ 7…97.
(ordinal1 = [1, 2, 3],
ordinal2 = CategoricalValue{Symbol,UInt32}[:x, :y, :x],
ordinal3 = [-1.0, 0.0, 1.0],
ordinal4 = [1.0, 0.0, -1.0],
nominal = CategoricalVale{String,UInt32}["Your father", "he", "is"],)
stand2 = Standardizer(features=[:ordinal3, ], ignore=true, count=true);
julia> transform(fit!(machine(stand2, X)), X)
[ Info: Training Machine{Standardizer} @ 1…87.
(ordinal1 = [-1.0, 0.0, 1.0],
ordinal2 = CategoricalValue{Symbol,UInt32}[:x, :y, :x],
ordinal3 = [10.0, 20.0, 30.0],
ordinal4 = [1.0, 0.0, -1.0],
nominal = CategoricalValue{String,UInt32}["Your father", "he", "is"],)MLJModels.OneHotEncoder — TypeOneHotEncoder(; features=Symbol[],
ignore=false,
ordered_factor=true,
drop_last=false)Unsupervised model for one-hot encoding the Finite features (columns) of some table. If features is unspecified all features with Finite element scitype are encoded. Otherwise, encoding is applied to all Finite features named in features (ignore=false) or all Finite features not named in features (ignore=true).
If ordered_factor=false then the above holds with Finite replaced with Multiclass, ie OrderedFactor features are not transformed.
Specify drop_last=true if the column for the last level of each categorical feature is to be dropped.
New data to be transformed may lack features present in the fit data, but no new features can be present.
Warning: This transformer assumes that levels(col) for any Multiclass or OrderedFactor column is the same in new data being transformed as it is in the data used to fit the transformer.
Example
X = (name=categorical(["Danesh", "Lee", "Mary", "John"]),
grade=categorical([:A, :B, :A, :C], ordered=true),
height=[1.85, 1.67, 1.5, 1.67],
n_devices=[3, 2, 4, 3])
schema(X)
┌───────────┬─────────────────────────────────┬──────────────────┐
│ _.names │ _.types │ _.scitypes │
├───────────┼─────────────────────────────────┼──────────────────┤
│ name │ CategoricalValue{String,UInt32} │ Multiclass{4} │
│ grade │ CategoricalValue{Symbol,UInt32} │ OrderedFactor{3} │
│ height │ Float64 │ Continuous │
│ n_devices │ Int64 │ Count │
└───────────┴─────────────────────────────────┴──────────────────┘
_.nrows = 4
hot = OneHotEncoder(ordered_factor=true);
mach = fit!(machine(hot, X))
transform(mach, X) |> schema
┌──────────────┬─────────┬────────────┐
│ _.names │ _.types │ _.scitypes │
├──────────────┼─────────┼────────────┤
│ name__Danesh │ Float64 │ Continuous │
│ name__John │ Float64 │ Continuous │
│ name__Lee │ Float64 │ Continuous │
│ name__Mary │ Float64 │ Continuous │
│ grade__A │ Float64 │ Continuous │
│ grade__B │ Float64 │ Continuous │
│ grade__C │ Float64 │ Continuous │
│ height │ Float64 │ Continuous │
│ n_devices │ Int64 │ Count │
└──────────────┴─────────┴────────────┘
_.nrows = 4MLJModels.ContinuousEncoder — TypeContinuousEncoder(one_hot_ordered_factors=false, drop_last=false)Unsupervised model for arranging all features (columns) of a table to have Continuous element scitype, by applying the following protocol to each feature ftr:
If
ftris alreadyContinuousretain it.If
ftrisMulticlass, one-hot encode it.If
ftrisOrderedFactor, replace it withcoerce(ftr, Continuous)(vector of floating point integers), unlessordered_factors=falseis specified, in which case one-hot encode it.If
ftrisCount, replace it withcoerce(ftr, Continuous).If
ftris of some other element scitype, or was not observed in fitting the encoder, drop it from the table.
If drop_last=true is specified, then one-hot encoding always drops the last class indicator column.
Warning: This transformer assumes that levels(col) for any Multiclass or OrderedFactor column is the same in new data being transformed as it is in the data used to fit the transformer.
Example
X = (name=categorical(["Danesh", "Lee", "Mary", "John"]),
grade=categorical([:A, :B, :A, :C], ordered=true),
height=[1.85, 1.67, 1.5, 1.67],
n_devices=[3, 2, 4, 3],
comments=["the force", "be", "with you", "too"])
schema(X)
┌───────────┬─────────────────────────────────┬──────────────────┐
│ _.names │ _.types │ _.scitypes │
├───────────┼─────────────────────────────────┼──────────────────┤
│ name │ CategoricalValue{String,UInt32} │ Multiclass{4} │
│ grade │ CategoricalValue{Symbol,UInt32} │ OrderedFactor{3} │
│ height │ Float64 │ Continuous │
│ n_devices │ Int64 │ Count │
│ comments │ String │ Textual │
└───────────┴─────────────────────────────────┴──────────────────┘
_.nrows = 4
cont = ContinuousEncoder(drop_last=true);
mach = fit!(machine(cont, X))
transform(mach, X) |> schema
┌──────────────┬─────────┬────────────┐
│ _.names │ _.types │ _.scitypes │
├──────────────┼─────────┼────────────┤
│ name__Danesh │ Float64 │ Continuous │
│ name__John │ Float64 │ Continuous │
│ name__Lee │ Float64 │ Continuous │
│ grade │ Float64 │ Continuous │
│ height │ Float64 │ Continuous │
│ n_devices │ Float64 │ Continuous │
└──────────────┴─────────┴────────────┘
_.nrows = 4MLJModels.FeatureSelector — TypeFeatureSelector(features=Symbol[], ignore=false)An unsupervised model for filtering features (columns) of a table. Only those features encountered during fitting will appear in transformed tables if features is empty (the default). Alternatively, if a non-empty features is specified, then only the specified features encountered during fitting are used (ignore=false) or all features encountered during fitting which are not named in features are used (ignore=true).
Throws an error if a recorded or specified feature is not present in the transformation input.
Instead of supplying a features vector, a Bool-valued callable with one argument can be also be specified. For example, specifying FeatureSelector(features = name -> name in [:x1, :x3], ignore = true) has the same effect as FeatureSelector(features = [:x1, :x3], ignore = true), namely to select all features, with the exception of :x1 and :x3.
Example
julia> X = (ordinal1 = [1, 2, 3],
ordinal2 = coerce([:x, :y, :x], OrderedFactor),
ordinal3 = [10.0, 20.0, 30.0],
ordinal4 = [-20.0, -30.0, -40.0],
nominal = coerce(["Your father", "he", "is"], Multiclass));
julia> select1 = FeatureSelector();
julia> transform(fit!(machine(select1, X)), X)
[ Info: Training Machine{FeatureSelector} @811.
(ordinal1 = [1, 2, 3],
ordinal2 = CategoricalValue{Symbol,UInt32}[:x, :y, :x],
ordinal3 = [-1.0, 0.0, 1.0],
ordinal4 = [1.0, 0.0, -1.0],
nominal = CategoricalVale{String,UInt32}["Your father", "he", "is"],)
julia> select2 = FeatureSelector(features=[:ordinal3, ], ignore=true);
julia> transform(fit!(machine(select2, X)), X)
[ Info: Training Machine{FeatureSelector} @721.
(ordinal1 = [1, 2, 3],
ordinal2 = CategoricalValue{Symbol,UInt32}[:x, :y, :x],
ordinal4 = [-20.0, -30.0, -40.0],
nominal = CategoricalValue{String,UInt32}["Your father", "he", "is"],)
MLJModels.UnivariateBoxCoxTransformer — TypeUnivariateBoxCoxTransformer(; n=171, shift=false)Unsupervised model specifying a univariate Box-Cox transformation of a single variable taking non-negative values, with a possible preliminary shift. Such a transformation is of the form
x -> ((x + c)^λ - 1)/λ for λ not 0
x -> log(x + c) for λ = 0On fitting to data n different values of the Box-Cox exponent λ (between -0.4 and 3) are searched to fix the value maximizing normality. If shift=true and zero values are encountered in the data then the transformation sought includes a preliminary positive shift c of 0.2 times the data mean. If there are no zero values, then no shift is applied.
MLJModels.UnivariateDiscretizer — TypeUnivariateDiscretizer(n_classes=512)Returns an MLJModel for for discretizing any continuous vector v (scitype(v) <: AbstractVector{Continuous}), where n_classes describes the resolution of the discretization.
Transformed output w is a vector of ordered factors (scitype(w) <: AbstractVector{<:OrderedFactor}). Specifically, w is a CategoricalVector, with element type CategoricalValue{R,R}, where R<Unsigned is optimized.
The transformation is chosen so that the vector on which the transformer is fit has, in transformed form, an approximately uniform distribution of values.
Example
using MLJ
t = UnivariateDiscretizer(n_classes=10)
discretizer = machine(t, randn(1000))
fit!(discretizer)
v = rand(10)
w = transform(discretizer, v)
v_approx = inverse_transform(discretizer, w) # reconstruction of v from wMLJModels.FillImputer — TypeFillImputer(
features = [],
continuous_fill = e -> skipmissing(e) |> median
count_fill = e -> skipmissing(e) |> (f -> round(eltype(f), median(f)))
finite_fill = e -> skipmissing(e) |> modeImputes missing data with a fixed value computed on the non-missing values. A different imputing function can be specified for Continuous, Count and Finite data.
Fields
continuous_fill: function to use onContinuousdata, by default the mediancount_fill: function to use onCountdata, by default the rounded medianfinite_fill: function to use onMulticlassandOrderedFactordata (including binary data), by default the mode
Static transformers
The main use-case for static transformers is for insertion into a @pipeline or other exported learning network (see Composing Models). If a static transformer has no hyper-parameters, it is tantamount to an ordinary function. An ordinary function can be inserted directly into a @pipeline; the situation for learning networks is only slightly more complicated; see Static operations on nodes.
The following example defines a new model type Averager to perform the weighted average of two vectors (target predictions, for example). We suppose the weighting is normalized, and therefore controlled by a single hyper-parameter, mix.
mutable struct Averager <: Static
mix::Float64
end
import MLJBase
MLJBase.transform(a::Averager, _, y1, y2) = (1 - a.mix)*y1 + a.mix*y2Important. Note the sub-typing <: Static.
Such static transformers with (unlearned) parameters can have arbitrarily many inputs, but only one output. In the single input case an inverse_transform can also be defined. Since they have no real learned parameters, you bind a static transformer to a machine without specifying training arguments.
mach = machine(Averager(0.5)) |> fit!
transform(mach, [1, 2, 3], [3, 2, 1])
3-element Array{Float64,1}:
2.0
2.0
2.0Let's see how we can include our Averager in a learning network (see Composing Models) to mix the predictions of two regressors, with one-hot encoding of the inputs:
X = source()
y = source() #MLJ will automatically infer this a target node
ridge = @load RidgeRegressor pkg=MultivariateStats
knn = @load KNNRegressor
averager = Averager(0.5)
hotM = machine(OneHotEncoder(), X)
W = transform(hotM, X) # one-hot encode the input
ridgeM = machine(ridge, W, y)
y1 = predict(ridgeM, W)
knnM = machine(knn, W, y)
y2 = predict(knnM, W)
averagerM= machine(averager)
yhat = transform(averagerM, y1, y2)Now we export to obtain a Deterministic composite model and then instantiate composite model
learning_mach = machine(Deterministic(), X, y; predict=yhat)
Machine{DeterministicSurrogate} @772 trained 0 times.
args:
1: Source @415 ⏎ `Unknown`
2: Source @389 ⏎ `Unknown`
@from_network learning_mach struct DoubleRegressor
regressor1=ridge
regressor2=knn
averager=averager
end
composite = DoubleRegressor()
julia> composite = DoubleRegressor()
DoubleRegressor(
regressor1 = RidgeRegressor(
lambda = 1.0),
regressor2 = KNNRegressor(
K = 5,
algorithm = :kdtree,
metric = Distances.Euclidean(0.0),
leafsize = 10,
reorder = true,
weights = :uniform),
averager = Averager(
mix = 0.5)) @301
which can be can be evaluated like any other model:
composite.averager.mix = 0.25 # adjust mix from default of 0.5
evaluate(composite, (@load_reduced_ames)..., measure=rms)
julia> evaluate(composite, (@load_reduced_ames)..., measure=rms)
Evaluating over 6 folds: 100%[=========================] Time: 0:00:00
┌───────────┬───────────────┬────────────────────────────────────────────────────────┐
│ _.measure │ _.measurement │ _.per_fold │
├───────────┼───────────────┼────────────────────────────────────────────────────────┤
│ rms │ 26800.0 │ [21400.0, 23700.0, 26800.0, 25900.0, 30800.0, 30700.0] │
└───────────┴───────────────┴────────────────────────────────────────────────────────┘
_.per_observation = [missing]Transformers that also predict
Commonly, clustering algorithms learn to label data by identifying a collection of "centroids" in the training data. Any new input observation is labeled with the cluster to which it is closest (this is the output of predict) while the vector of all distances from the centroids defines a lower-dimensional representation of the observation (the output of transform). In the following example a K-means clustering algorithm assigns one of three labels 1, 2, 3 to the input features of the iris data set and compares them with the actual species recorded in the target (not seen by the algorithm).
import Random.seed!
seed!(123)
X, y = @load_iris;
model = @load KMeans pkg=ParallelKMeans
mach = machine(model, X) |> fit!
# transforming:
Xsmall = transform(mach);
selectrows(Xsmall, 1:4) |> pretty
julia> selectrows(Xsmall, 1:4) |> pretty
┌─────────────────────┬────────────────────┬────────────────────┐
│ x1 │ x2 │ x3 │
│ Float64 │ Float64 │ Float64 │
│ Continuous │ Continuous │ Continuous │
├─────────────────────┼────────────────────┼────────────────────┤
│ 0.0215920000000267 │ 25.314260355029603 │ 11.645232464391299 │
│ 0.19199200000001326 │ 25.882721893491123 │ 11.489658693899486 │
│ 0.1699920000000077 │ 27.58656804733728 │ 12.674412792260142 │
│ 0.26919199999998966 │ 26.28656804733727 │ 11.64392098898145 │
└─────────────────────┴────────────────────┴────────────────────┘
# predicting:
yhat = predict(mach);
compare = zip(yhat, y) |> collect;
compare[1:8]
8-element Array{Tuple{CategoricalValue{Int64,UInt32},CategoricalString{UInt32}},1}:
(1, "setosa")
(1, "setosa")
(1, "setosa")
(1, "setosa")
(1, "setosa")
(1, "setosa")
(1, "setosa")
(1, "setosa")
compare[51:58]
8-element Array{Tuple{CategoricalValue{Int64,UInt32},CategoricalString{UInt32}},1}:
(2, "versicolor")
(3, "versicolor")
(2, "versicolor")
(3, "versicolor")
(3, "versicolor")
(3, "versicolor")
(3, "versicolor")
(3, "versicolor")
compare[101:108]
8-element Array{Tuple{CategoricalValue{Int64,UInt32},CategoricalString{UInt32}},1}:
(2, "virginica")
(3, "virginica")
(2, "virginica")
(2, "virginica")
(2, "virginica")
(2, "virginica")
(3, "virginica")
(2, "virginica")