Built-in Transformers
MLJ.Transformers.FeatureSelector
— Type.FeatureSelector(features=Symbol[])
An unsupervised model for filtering features (columns) of a table. Only those features encountered during fitting will appear in transformed tables if features
is empty (the default). Alternatively, if a non-empty features
is specified, then only the specified features are used. Throws an error if a recorded or specified feature is not present in the transformation input.
UnivariateStandardizer()
Unsupervised model for standardizing (whitening) univariate data.
MLJ.Transformers.Standardizer
— Type.Standardizer(; features=Symbol[])
Unsupervised model for standardizing (whitening) the columns of tabular data. If features
is empty then all columns v
for which all elements have Continuous
scitypes are standardized. For different behaviour, specify the names of features to be standardized.
using DataFrames
X = DataFrame(x1=[0.2, 0.3, 1.0], x2=[4, 2, 3])
stand_model = Standardizer()
transform(fit!(machine(stand_model, X)), X)
3×2 DataFrame
│ Row │ x1 │ x2 │
│ │ Float64 │ Int64 │
├─────┼───────────┼───────┤
│ 1 │ -0.688247 │ 4 │
│ 2 │ -0.458831 │ 2 │
│ 3 │ 1.14708 │ 3 │
UnivariateBoxCoxTransformer(; n=171, shift=false)
Unsupervised model specifying a univariate Box-Cox transformation of a single variable taking non-negative values, with a possible preliminary shift. Such a transformation is of the form
x -> ((x + c)^λ - 1)/λ for λ not 0
x -> log(x + c) for λ = 0
On fitting to data n
different values of the Box-Cox exponent λ (between -0.4
and 3
) are searched to fix the value maximizing normality. If shift=true
and zero values are encountered in the data then the transformation sought includes a preliminary positive shift c
of 0.2
times the data mean. If there are no zero values, then no shift is applied.
See also BoxCoxEstimator
a transformer for selected ordinals in a an iterable table.
MLJ.Transformers.OneHotEncoder
— Type.OneHotEncoder(; features=Symbol[], drop_last=false, ref_type=UInt32)
Unsupervised model for one-hot encoding all features of Multiclass
or FiniteOrderedFactor
scitype, within some table. All such features are encoded, unless features
is specified and non-empty.
If drop_last
is true, the column for the last level of each categorical feature is dropped. New data to be transformed may lack features present in the fit data, but no new features can be present.
All categorical features to be transformed (which are necessarily of CategoricalValue
or CategoricalString
eltype) must have a reference type promotable to ref_type
. Usually ref_type=UInt32
suffices, but ref_type=Int
will always work.
Warning: This transformer assumes that a categorical feature in new data to be transformed will have the same pool encountered during the fit.