ContinuousEncoder
ContinuousEncoderA model type for constructing a continuous encoder, based on MLJTransforms.jl, and implementing the MLJ model interface.
From MLJ, the type can be imported using
ContinuousEncoder = @load ContinuousEncoder pkg=MLJTransformsDo model = ContinuousEncoder() to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in ContinuousEncoder(drop_last=...).
Use this model to arrange all features (features) of a table to have Continuous element scitype, by applying the following protocol to each feature ftr:
- If
ftris alreadyContinuousretain it. - If
ftrisMulticlass, one-hot encode it. - If
ftrisOrderedFactor, replace it withcoerce(ftr, Continuous)(vector of floating point integers), unlessordered_factors=falseis specified, in which case one-hot encode it. - If
ftrisCount, replace it withcoerce(ftr, Continuous). - If
ftrhas some other element scitype, or was not observed in fitting the encoder, drop it from the table.
Warning: This transformer assumes that levels(col) for any Multiclass or OrderedFactor column, col, is the same for training data and new data to be transformed.
To selectively one-hot-encode categorical features (without dropping features) use OneHotEncoder instead.
Training data
In MLJ or MLJBase, bind an instance model to data with
mach = machine(model, X)where
X: any Tables.jl compatible table. features can be of mixed type but only those with element scitypeMulticlassorOrderedFactorcan be encoded. Check column scitypes withschema(X).
Train the machine using fit!(mach, rows=...).
Hyper-parameters
drop_last=true: whether to drop the column corresponding to the final class of one-hot encoded features. For example, a three-class feature is spawned into three new features ifdrop_last=false, but two just features otherwise.one_hot_ordered_factors=false: whether to one-hot any feature withOrderedFactorelement scitype, or to instead coerce it directly to a (single)Continuousfeature using the order
Fitted parameters
The fields of fitted_params(mach) are:
features_to_keep: names of features that will not be dropped from the tableone_hot_encoder: theOneHotEncodermodel instance for handling the one-hot encodingone_hot_encoder_fitresult: the fitted parameters of theOneHotEncodermodel
Report
features_to_keep: names of input features that will not be dropped from the tablenew_features: names of all output features
Example
X = (name=categorical(["Danesh", "Lee", "Mary", "John"]),
grade=categorical(["A", "B", "A", "C"], ordered=true),
height=[1.85, 1.67, 1.5, 1.67],
n_devices=[3, 2, 4, 3],
comments=["the force", "be", "with you", "too"])
julia> schema(X)
┌───────────┬──────────────────┐
│ names │ scitypes │
├───────────┼──────────────────┤
│ name │ Multiclass{4} │
│ grade │ OrderedFactor{3} │
│ height │ Continuous │
│ n_devices │ Count │
│ comments │ Textual │
└───────────┴──────────────────┘
encoder = ContinuousEncoder(drop_last=true)
mach = fit!(machine(encoder, X))
W = transform(mach, X)
julia> schema(W)
┌──────────────┬────────────┐
│ names │ scitypes │
├──────────────┼────────────┤
│ name__Danesh │ Continuous │
│ name__John │ Continuous │
│ name__Lee │ Continuous │
│ grade │ Continuous │
│ height │ Continuous │
│ n_devices │ Continuous │
└──────────────┴────────────┘
julia> setdiff(schema(X).names, report(mach).features_to_keep) ## dropped features
1-element Vector{Symbol}:
:comments
See also OneHotEncoder