OrdinalEncoder
OrdinalEncoderA model type for constructing a ordinal encoder, based on MLJTransforms.jl, and implementing the MLJ model interface.
From MLJ, the type can be imported using
OrdinalEncoder = @load OrdinalEncoder pkg=MLJTransformsDo model = OrdinalEncoder() to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in OrdinalEncoder(features=...).
OrdinalEncoder implements ordinal encoding which replaces the categorical values in the specified categorical features with integers (ordered arbitrarily). This will create an implicit ordering between categories which may not be a proper modelling assumption.
Training data
In MLJ (or MLJBase) bind an instance unsupervised model to data with
mach = machine(model, X)Here:
Xis any table of input features (eg, aDataFrame). Features to be transformed must have element scitypeMulticlassorOrderedFactor. Useschema(X)to check scitypes.
Train the machine using fit!(mach, rows=...).
Hyper-parameters
- features=[]: A list of names of categorical features given as symbols to exclude or in clude from encoding, according to the value of
ignore, or a single symbol (which is treated as a vector with one symbol), or a callable that returns true for features to be included/excluded. - ignore=true: Whether to exclude or include the features given in
features - ordered_factor=false: Whether to encode
OrderedFactoror ignore them output_type: The numerical concrete type of the encoded features. Default isFloat32.
Operations
transform(mach, Xnew): Apply ordinal encoding to selectedMulticlassorOrderedFactor features ofXnewspecified by hyper-parameters, and return the new table. Features that are neitherMulticlassnorOrderedFactor` are always left unchanged.
Fitted parameters
The fields of fitted_params(mach) are:
index_given_feat_level: A dictionary that maps each level for each column in a subset of the categorical features of X into an integer.
Report
The fields of report(mach) are:
- encoded_features: The subset of the categorical features of
Xthat were encoded
Examples
using MLJ
## Define categorical features
A = ["g", "b", "g", "r", "r",]
B = [1.0, 2.0, 3.0, 4.0, 5.0,]
C = ["f", "f", "f", "m", "f",]
D = [true, false, true, false, true,]
E = [1, 2, 3, 4, 5,]
## Combine into a named tuple
X = (A = A, B = B, C = C, D = D, E = E)
## Coerce A, C, D to multiclass and B to continuous and E to ordinal
X = coerce(X,
:A => Multiclass,
:B => Continuous,
:C => Multiclass,
:D => Multiclass,
:E => OrderedFactor,
)
## Check scitype coercion:
schema(X)
encoder = OrdinalEncoder(ordered_factor = false)
mach = fit!(machine(encoder, X))
Xnew = transform(mach, X)
julia > Xnew
(A = [2, 1, 2, 3, 3],
B = [1.0, 2.0, 3.0, 4.0, 5.0],
C = [1, 1, 1, 2, 1],
D = [2, 1, 2, 1, 2],
E = CategoricalArrays.CategoricalValue{Int64, UInt32}[1, 2, 3, 4, 5],)See also TargetEncoder