ContinuousEncoder
ContinuousEncoder
A model type for constructing a continuous encoder, based on MLJModels.jl, and implementing the MLJ model interface.
From MLJ, the type can be imported using
ContinuousEncoder = @load ContinuousEncoder pkg=MLJModels
Do model = ContinuousEncoder()
to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in ContinuousEncoder(drop_last=...)
.
Use this model to arrange all features (columns) of a table to have Continuous
element scitype, by applying the following protocol to each feature ftr
:
- If
ftr
is alreadyContinuous
retain it. - If
ftr
isMulticlass
, one-hot encode it. - If
ftr
isOrderedFactor
, replace it withcoerce(ftr, Continuous)
(vector of floating point integers), unlessordered_factors=false
is specified, in which case one-hot encode it. - If
ftr
isCount
, replace it withcoerce(ftr, Continuous)
. - If
ftr
has some other element scitype, or was not observed in fitting the encoder, drop it from the table.
Warning: This transformer assumes that levels(col)
for any Multiclass
or OrderedFactor
column, col
, is the same for training data and new data to be transformed.
To selectively one-hot-encode categorical features (without dropping columns) use OneHotEncoder
instead.
Training data
In MLJ or MLJBase, bind an instance model
to data with
mach = machine(model, X)
where
X
: any Tables.jl compatible table. Columns can be of mixed type but only those with element scitypeMulticlass
orOrderedFactor
can be encoded. Check column scitypes withschema(X)
.
Train the machine using fit!(mach, rows=...)
.
Hyper-parameters
drop_last=true
: whether to drop the column corresponding to the final class of one-hot encoded features. For example, a three-class feature is spawned into three new features ifdrop_last=false
, but two just features otherwise.one_hot_ordered_factors=false
: whether to one-hot any feature withOrderedFactor
element scitype, or to instead coerce it directly to a (single)Continuous
feature using the order
Fitted parameters
The fields of fitted_params(mach)
are:
features_to_keep
: names of features that will not be dropped from the tableone_hot_encoder
: theOneHotEncoder
model instance for handling the one-hot encodingone_hot_encoder_fitresult
: the fitted parameters of theOneHotEncoder
model
Report
features_to_keep
: names of input features that will not be dropped from the tablenew_features
: names of all output features
Example
X = (name=categorical(["Danesh", "Lee", "Mary", "John"]),
grade=categorical(["A", "B", "A", "C"], ordered=true),
height=[1.85, 1.67, 1.5, 1.67],
n_devices=[3, 2, 4, 3],
comments=["the force", "be", "with you", "too"])
julia> schema(X)
┌───────────┬──────────────────┐
│ names │ scitypes │
├───────────┼──────────────────┤
│ name │ Multiclass{4} │
│ grade │ OrderedFactor{3} │
│ height │ Continuous │
│ n_devices │ Count │
│ comments │ Textual │
└───────────┴──────────────────┘
encoder = ContinuousEncoder(drop_last=true)
mach = fit!(machine(encoder, X))
W = transform(mach, X)
julia> schema(W)
┌──────────────┬────────────┐
│ names │ scitypes │
├──────────────┼────────────┤
│ name__Danesh │ Continuous │
│ name__John │ Continuous │
│ name__Lee │ Continuous │
│ grade │ Continuous │
│ height │ Continuous │
│ n_devices │ Continuous │
└──────────────┴────────────┘
julia> setdiff(schema(X).names, report(mach).features_to_keep) ## dropped features
1-element Vector{Symbol}:
:comments
See also OneHotEncoder