Built-in Transformers
MLJModels.UnivariateStandardizer
— Type.UnivariateStandardizer()
Unsupervised model for standardizing (whitening) univariate data.
MLJModels.Standardizer
— Type. Standardizer(; features=Symbol[])
Unsupervised model for standardizing (whitening) the columns of tabular data. If features
is empty then all columns v
for which all elements have Continuous
scitypes are standardized. For different behaviour (e.g. standardizing counts as well), specify the names of features to be standardized.
using DataFrames
X = DataFrame(x1=[0.2, 0.3, 1.0], x2=[4, 2, 3])
stand_model = Standardizer()
transform(fit!(machine(stand_model, X)), X)
3×2 DataFrame
│ Row │ x1 │ x2 │
│ │ Float64 │ Int64 │
├─────┼───────────┼───────┤
│ 1 │ -0.688247 │ 4 │
│ 2 │ -0.458831 │ 2 │
│ 3 │ 1.14708 │ 3 │
MLJModels.OneHotEncoder
— Type.OneHotEncoder(; features=Symbol[], drop_last=false, ordered_factor=true)
Unsupervised model for one-hot encoding all features of Finite
scitype, within some table. If ordered_factor=false
then only Multiclass
features are considered. The features encoded are further restricted to those in features
, when specified and non-empty.
If drop_last
is true, the column for the last level of each categorical feature is dropped. New data to be transformed may lack features present in the fit data, but no new features can be present.
Warning: This transformer assumes that the elements of a categorical feature in new data to be transformed point to the same CategoricalPool object encountered during the fit.
MLJModels.FeatureSelector
— Type.FeatureSelector(features=Symbol[])
An unsupervised model for filtering features (columns) of a table. Only those features encountered during fitting will appear in transformed tables if features
is empty (the default). Alternatively, if a non-empty features
is specified, then only the specified features are used. Throws an error if a recorded or specified feature is not present in the transformation input.
UnivariateBoxCoxTransformer(; n=171, shift=false)
Unsupervised model specifying a univariate Box-Cox transformation of a single variable taking non-negative values, with a possible preliminary shift. Such a transformation is of the form
x -> ((x + c)^λ - 1)/λ for λ not 0
x -> log(x + c) for λ = 0
On fitting to data n
different values of the Box-Cox exponent λ (between -0.4
and 3
) are searched to fix the value maximizing normality. If shift=true
and zero values are encountered in the data then the transformation sought includes a preliminary positive shift c
of 0.2
times the data mean. If there are no zero values, then no shift is applied.
MLJModels.UnivariateDiscretizer
— Type.UnivariateDiscretizer(n_classes=512)
Returns an MLJModel
for for discretizing any continuous vector v
(scitype(v) <: AbstractVector{Continuous}
), where n_classes
describes the resolution of the discretization.
Transformed output w
is a vector of ordered factors (scitype(w) <: AbstractVector{<:OrderedFactor}
). Specifically, w
is a CategoricalVector
, with element type CategoricalValue{R,R}
, where R<Unsigned
is optimized.
The transformation is chosen so that the vector on which the transformer is fit has, in transformed form, an approximately uniform distribution of values.
Example
using MLJ
t = UnivariateDiscretizer(n_classes=10)
discretizer = machine(t, randn(1000))
fit!(discretizer)
v = rand(10)
w = transform(discretizer, v)
v_approx = inverse_transform(discretizer, w) # reconstruction of v from w
MLJModels.FillImputer
— Type.FillImputer(features=[],
continuous_fill=<median>,
count_fill=<round_median>,
finite_fill=<mode>)
Imputes missing data with a fixed value computed on the non-missing values. A different imputing function can be specified for Continuous
, Count
and Finite
data.
Fields
continuous_fill
: function to use onContinuous
data, by default the mediancount_fill
: function to use onCount
data, by default the rounded medianfinite_fill
: function to use onMulticlass
andOrderedFactor
data (including binary data), by default the mode