Transformers and other unsupervised models
Several unsupervised models used to perform common transformations, such as one-hot encoding, are available in MLJ out-of-the-box. These are detailed in Built-in transformers below.
A transformer is static if it has no learned parameters. While such a transformer is tantamount to an ordinary function, realizing it as an MLJ static transformer (subtype of Static <: Unsupervised
) can be useful, especially if the function depends on parameters the user would like to manipulate (which become hyper-parameters of the model). The necessary syntax for defining your own static transformers is described in Static transformers below.
Some unsupervised models, such as clustering algorithms, have a predict
method in addition to a transform
method. We give an example of this in Transformers that also predict
Finally we note that models that fit a distribution, or more generally a sampler object, to some data, which are sometimes viewed as unsupervised, are treated in MLJ as supervised models. See Models that learn a probability distribution for an example.
Built-in transformers
MLJModels.UnivariateStandardizer
— TypeUnivariateStandardizer()
Unsupervised model for standardizing (whitening) univariate data.
MLJModels.Standardizer
— TypeStandardizer(; features=Symbol[],
ignore=false,
ordered_factor=false,
count=false)
Unsupervised model for standardizing (whitening) the columns of tabular data. If features
is unspecified then all columns having Continuous
element scitype are standardized. Otherwise, the features standardized are the Continuous
features named in features
(ignore=false
) or Continuous
features not named in features
(ignore=true
). To allow standarization of Count
or OrderedFactor
features as well, set the appropriate flag to true.
Instead of supplying a features vector, a Bool-valued callable with one argument can be also be specified. For example, specifying Standardizer(features = name -> name in [:x1, :x3], ignore = true, count=true)
has the same effect as Standardizer(features = [:x1, :x3], ignore = true, count=true)
, namely to standardise all Continuous
and Count
features, with the exception of :x1
and :x3
.
The inverse_tranform
method is supported provided count=false
and ordered_factor=false
at time of fit.
Example
X = (ordinal1 = [1, 2, 3],
ordinal2 = coerce([:x, :y, :x], OrderedFactor),
ordinal3 = [10.0, 20.0, 30.0],
ordinal4 = [-20.0, -30.0, -40.0],
nominal = coerce(["Your father", "he", "is"], Multiclass));
stand1 = Standardizer();
julia> transform(fit!(machine(stand1, X)), X)
[ Info: Training Machine{Standardizer} @ 7…97.
(ordinal1 = [1, 2, 3],
ordinal2 = CategoricalValue{Symbol,UInt32}[:x, :y, :x],
ordinal3 = [-1.0, 0.0, 1.0],
ordinal4 = [1.0, 0.0, -1.0],
nominal = CategoricalVale{String,UInt32}["Your father", "he", "is"],)
stand2 = Standardizer(features=[:ordinal3, ], ignore=true, count=true);
julia> transform(fit!(machine(stand2, X)), X)
[ Info: Training Machine{Standardizer} @ 1…87.
(ordinal1 = [-1.0, 0.0, 1.0],
ordinal2 = CategoricalValue{Symbol,UInt32}[:x, :y, :x],
ordinal3 = [10.0, 20.0, 30.0],
ordinal4 = [1.0, 0.0, -1.0],
nominal = CategoricalValue{String,UInt32}["Your father", "he", "is"],)
MLJModels.OneHotEncoder
— TypeOneHotEncoder(; features=Symbol[],
ignore=false,
ordered_factor=true,
drop_last=false)
Unsupervised model for one-hot encoding the Finite
features (columns) of some table. If features
is unspecified all features with Finite
element scitype are encoded. Otherwise, encoding is applied to all Finite
features named in features
(ignore=false
) or all Finite
features not named in features (ignore=true
).
If ordered_factor=false
then the above holds with Finite
replaced with Multiclass
, ie OrderedFactor
features are not transformed.
Specify drop_last=true
if the column for the last level of each categorical feature is to be dropped.
New data to be transformed may lack features present in the fit data, but no new features can be present.
Warning: This transformer assumes that levels(col)
for any Multiclass
or OrderedFactor
column is the same in new data being transformed as it is in the data used to fit the transformer.
Example
X = (name=categorical(["Danesh", "Lee", "Mary", "John"]),
grade=categorical([:A, :B, :A, :C], ordered=true),
height=[1.85, 1.67, 1.5, 1.67],
n_devices=[3, 2, 4, 3])
schema(X)
┌───────────┬─────────────────────────────────┬──────────────────┐
│ _.names │ _.types │ _.scitypes │
├───────────┼─────────────────────────────────┼──────────────────┤
│ name │ CategoricalValue{String,UInt32} │ Multiclass{4} │
│ grade │ CategoricalValue{Symbol,UInt32} │ OrderedFactor{3} │
│ height │ Float64 │ Continuous │
│ n_devices │ Int64 │ Count │
└───────────┴─────────────────────────────────┴──────────────────┘
_.nrows = 4
hot = OneHotEncoder(ordered_factor=true);
mach = fit!(machine(hot, X))
transform(mach, X) |> schema
┌──────────────┬─────────┬────────────┐
│ _.names │ _.types │ _.scitypes │
├──────────────┼─────────┼────────────┤
│ name__Danesh │ Float64 │ Continuous │
│ name__John │ Float64 │ Continuous │
│ name__Lee │ Float64 │ Continuous │
│ name__Mary │ Float64 │ Continuous │
│ grade__A │ Float64 │ Continuous │
│ grade__B │ Float64 │ Continuous │
│ grade__C │ Float64 │ Continuous │
│ height │ Float64 │ Continuous │
│ n_devices │ Int64 │ Count │
└──────────────┴─────────┴────────────┘
_.nrows = 4
MLJModels.ContinuousEncoder
— TypeContinuousEncoder(one_hot_ordered_factors=false, drop_last=false)
Unsupervised model for arranging all features (columns) of a table to have Continuous
element scitype, by applying the following protocol to each feature ftr
:
If
ftr
is alreadyContinuous
retain it.If
ftr
isMulticlass
, one-hot encode it.If
ftr
isOrderedFactor
, replace it withcoerce(ftr, Continuous)
(vector of floating point integers), unlessordered_factors=false
is specified, in which case one-hot encode it.If
ftr
isCount
, replace it withcoerce(ftr, Continuous)
.If
ftr
is of some other element scitype, or was not observed in fitting the encoder, drop it from the table.
If drop_last=true
is specified, then one-hot encoding always drops the last class indicator column.
Warning: This transformer assumes that levels(col)
for any Multiclass
or OrderedFactor
column is the same in new data being transformed as it is in the data used to fit the transformer.
Example
X = (name=categorical(["Danesh", "Lee", "Mary", "John"]),
grade=categorical([:A, :B, :A, :C], ordered=true),
height=[1.85, 1.67, 1.5, 1.67],
n_devices=[3, 2, 4, 3],
comments=["the force", "be", "with you", "too"])
schema(X)
┌───────────┬─────────────────────────────────┬──────────────────┐
│ _.names │ _.types │ _.scitypes │
├───────────┼─────────────────────────────────┼──────────────────┤
│ name │ CategoricalValue{String,UInt32} │ Multiclass{4} │
│ grade │ CategoricalValue{Symbol,UInt32} │ OrderedFactor{3} │
│ height │ Float64 │ Continuous │
│ n_devices │ Int64 │ Count │
│ comments │ String │ Textual │
└───────────┴─────────────────────────────────┴──────────────────┘
_.nrows = 4
cont = ContinuousEncoder(drop_last=true);
mach = fit!(machine(cont, X))
transform(mach, X) |> schema
┌──────────────┬─────────┬────────────┐
│ _.names │ _.types │ _.scitypes │
├──────────────┼─────────┼────────────┤
│ name__Danesh │ Float64 │ Continuous │
│ name__John │ Float64 │ Continuous │
│ name__Lee │ Float64 │ Continuous │
│ grade │ Float64 │ Continuous │
│ height │ Float64 │ Continuous │
│ n_devices │ Float64 │ Continuous │
└──────────────┴─────────┴────────────┘
_.nrows = 4
MLJModels.FeatureSelector
— TypeFeatureSelector(features=Symbol[], ignore=false)
An unsupervised model for filtering features (columns) of a table. Only those features encountered during fitting will appear in transformed tables if features
is empty (the default). Alternatively, if a non-empty features
is specified, then only the specified features encountered during fitting are used (ignore=false
) or all features encountered during fitting which are not named in features
are used (ignore=true
).
Throws an error if a recorded or specified feature is not present in the transformation input.
Instead of supplying a features vector, a Bool-valued callable with one argument can be also be specified. For example, specifying FeatureSelector(features = name -> name in [:x1, :x3], ignore = true)
has the same effect as FeatureSelector(features = [:x1, :x3], ignore = true)
, namely to select all features, with the exception of :x1
and :x3
.
Example
julia> X = (ordinal1 = [1, 2, 3],
ordinal2 = coerce([:x, :y, :x], OrderedFactor),
ordinal3 = [10.0, 20.0, 30.0],
ordinal4 = [-20.0, -30.0, -40.0],
nominal = coerce(["Your father", "he", "is"], Multiclass));
julia> select1 = FeatureSelector();
julia> transform(fit!(machine(select1, X)), X)
[ Info: Training Machine{FeatureSelector} @811.
(ordinal1 = [1, 2, 3],
ordinal2 = CategoricalValue{Symbol,UInt32}[:x, :y, :x],
ordinal3 = [-1.0, 0.0, 1.0],
ordinal4 = [1.0, 0.0, -1.0],
nominal = CategoricalVale{String,UInt32}["Your father", "he", "is"],)
julia> select2 = FeatureSelector(features=[:ordinal3, ], ignore=true);
julia> transform(fit!(machine(select2, X)), X)
[ Info: Training Machine{FeatureSelector} @721.
(ordinal1 = [1, 2, 3],
ordinal2 = CategoricalValue{Symbol,UInt32}[:x, :y, :x],
ordinal4 = [-20.0, -30.0, -40.0],
nominal = CategoricalValue{String,UInt32}["Your father", "he", "is"],)
MLJModels.UnivariateBoxCoxTransformer
— TypeUnivariateBoxCoxTransformer(; n=171, shift=false)
Unsupervised model specifying a univariate Box-Cox transformation of a single variable taking non-negative values, with a possible preliminary shift. Such a transformation is of the form
x -> ((x + c)^λ - 1)/λ for λ not 0
x -> log(x + c) for λ = 0
On fitting to data n
different values of the Box-Cox exponent λ (between -0.4
and 3
) are searched to fix the value maximizing normality. If shift=true
and zero values are encountered in the data then the transformation sought includes a preliminary positive shift c
of 0.2
times the data mean. If there are no zero values, then no shift is applied.
MLJModels.UnivariateDiscretizer
— TypeUnivariateDiscretizer(n_classes=512)
Returns an MLJModel
for for discretizing any continuous vector v
(scitype(v) <: AbstractVector{Continuous}
), where n_classes
describes the resolution of the discretization.
Transformed output w
is a vector of ordered factors (scitype(w) <: AbstractVector{<:OrderedFactor}
). Specifically, w
is a CategoricalVector
, with element type CategoricalValue{R,R}
, where R<Unsigned
is optimized.
The transformation is chosen so that the vector on which the transformer is fit has, in transformed form, an approximately uniform distribution of values.
Example
using MLJ
t = UnivariateDiscretizer(n_classes=10)
discretizer = machine(t, randn(1000))
fit!(discretizer)
v = rand(10)
w = transform(discretizer, v)
v_approx = inverse_transform(discretizer, w) # reconstruction of v from w
MLJModels.FillImputer
— TypeFillImputer(
features = [],
continuous_fill = e -> skipmissing(e) |> median
count_fill = e -> skipmissing(e) |> (f -> round(eltype(f), median(f)))
finite_fill = e -> skipmissing(e) |> mode
Imputes missing data with a fixed value computed on the non-missing values. A different imputing function can be specified for Continuous
, Count
and Finite
data.
Fields
continuous_fill
: function to use onContinuous
data, by default the mediancount_fill
: function to use onCount
data, by default the rounded medianfinite_fill
: function to use onMulticlass
andOrderedFactor
data (including binary data), by default the mode
Static transformers
The main use-case for static transformers is for insertion into a @pipeline
or other exported learning network (see Composing Models). If a static transformer has no hyper-parameters, it is tantamount to an ordinary function. An ordinary function can be inserted directly into a @pipeline
; the situation for learning networks is only slightly more complicated; see Static operations on nodes.
The following example defines a new model type Averager
to perform the weighted average of two vectors (target predictions, for example). We suppose the weighting is normalized, and therefore controlled by a single hyper-parameter, mix
.
mutable struct Averager <: Static
mix::Float64
end
import MLJBase
MLJBase.transform(a::Averager, _, y1, y2) = (1 - a.mix)*y1 + a.mix*y2
Important. Note the sub-typing <: Static
.
Such static transformers with (unlearned) parameters can have arbitrarily many inputs, but only one output. In the single input case an inverse_transform
can also be defined. Since they have no real learned parameters, you bind a static transformer to a machine without specifying training arguments.
mach = machine(Averager(0.5)) |> fit!
transform(mach, [1, 2, 3], [3, 2, 1])
3-element Array{Float64,1}:
2.0
2.0
2.0
Let's see how we can include our Averager
in a learning network (see Composing Models) to mix the predictions of two regressors, with one-hot encoding of the inputs:
X = source()
y = source() #MLJ will automatically infer this a target node
ridge = @load RidgeRegressor pkg=MultivariateStats
knn = @load KNNRegressor
averager = Averager(0.5)
hotM = machine(OneHotEncoder(), X)
W = transform(hotM, X) # one-hot encode the input
ridgeM = machine(ridge, W, y)
y1 = predict(ridgeM, W)
knnM = machine(knn, W, y)
y2 = predict(knnM, W)
averagerM= machine(averager)
yhat = transform(averagerM, y1, y2)
Now we export to obtain a Deterministic
composite model and then instantiate composite model
learning_mach = machine(Deterministic(), X, y; predict=yhat)
Machine{DeterministicSurrogate} @772 trained 0 times.
args:
1: Source @415 ⏎ `Unknown`
2: Source @389 ⏎ `Unknown`
@from_network learning_mach struct DoubleRegressor
regressor1=ridge
regressor2=knn
averager=averager
end
composite = DoubleRegressor()
julia> composite = DoubleRegressor()
DoubleRegressor(
regressor1 = RidgeRegressor(
lambda = 1.0),
regressor2 = KNNRegressor(
K = 5,
algorithm = :kdtree,
metric = Distances.Euclidean(0.0),
leafsize = 10,
reorder = true,
weights = :uniform),
averager = Averager(
mix = 0.5)) @301
which can be can be evaluated like any other model:
composite.averager.mix = 0.25 # adjust mix from default of 0.5
evaluate(composite, (@load_reduced_ames)..., measure=rms)
julia> evaluate(composite, (@load_reduced_ames)..., measure=rms)
Evaluating over 6 folds: 100%[=========================] Time: 0:00:00
┌───────────┬───────────────┬────────────────────────────────────────────────────────┐
│ _.measure │ _.measurement │ _.per_fold │
├───────────┼───────────────┼────────────────────────────────────────────────────────┤
│ rms │ 26800.0 │ [21400.0, 23700.0, 26800.0, 25900.0, 30800.0, 30700.0] │
└───────────┴───────────────┴────────────────────────────────────────────────────────┘
_.per_observation = [missing]
Transformers that also predict
Commonly, clustering algorithms learn to label data by identifying a collection of "centroids" in the training data. Any new input observation is labeled with the cluster to which it is closest (this is the output of predict
) while the vector of all distances from the centroids defines a lower-dimensional representation of the observation (the output of transform
). In the following example a K-means clustering algorithm assigns one of three labels 1, 2, 3 to the input features of the iris data set and compares them with the actual species recorded in the target (not seen by the algorithm).
import Random.seed!
seed!(123)
X, y = @load_iris;
model = @load KMeans pkg=ParallelKMeans
mach = machine(model, X) |> fit!
# transforming:
Xsmall = transform(mach);
selectrows(Xsmall, 1:4) |> pretty
julia> selectrows(Xsmall, 1:4) |> pretty
┌─────────────────────┬────────────────────┬────────────────────┐
│ x1 │ x2 │ x3 │
│ Float64 │ Float64 │ Float64 │
│ Continuous │ Continuous │ Continuous │
├─────────────────────┼────────────────────┼────────────────────┤
│ 0.0215920000000267 │ 25.314260355029603 │ 11.645232464391299 │
│ 0.19199200000001326 │ 25.882721893491123 │ 11.489658693899486 │
│ 0.1699920000000077 │ 27.58656804733728 │ 12.674412792260142 │
│ 0.26919199999998966 │ 26.28656804733727 │ 11.64392098898145 │
└─────────────────────┴────────────────────┴────────────────────┘
# predicting:
yhat = predict(mach);
compare = zip(yhat, y) |> collect;
compare[1:8]
8-element Array{Tuple{CategoricalValue{Int64,UInt32},CategoricalString{UInt32}},1}:
(1, "setosa")
(1, "setosa")
(1, "setosa")
(1, "setosa")
(1, "setosa")
(1, "setosa")
(1, "setosa")
(1, "setosa")
compare[51:58]
8-element Array{Tuple{CategoricalValue{Int64,UInt32},CategoricalString{UInt32}},1}:
(2, "versicolor")
(3, "versicolor")
(2, "versicolor")
(3, "versicolor")
(3, "versicolor")
(3, "versicolor")
(3, "versicolor")
(3, "versicolor")
compare[101:108]
8-element Array{Tuple{CategoricalValue{Int64,UInt32},CategoricalString{UInt32}},1}:
(2, "virginica")
(3, "virginica")
(2, "virginica")
(2, "virginica")
(2, "virginica")
(2, "virginica")
(3, "virginica")
(2, "virginica")