FeatureSelector
FeatureSelector
A model type for constructing a feature selector, based on FeatureSelection.jl, and implementing the MLJ model interface.
From MLJ, the type can be imported using
FeatureSelector = @load FeatureSelector pkg=FeatureSelection
Do model = FeatureSelector()
to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in FeatureSelector(features=...)
.
Use this model to select features (columns) of a table, usually as part of a model Pipeline
.
Training data
In MLJ or MLJBase, bind an instance model
to data with
mach = machine(model, X)
where
X
: any table of input features, where "table" is in the sense of Tables.jl
Train the machine using fit!(mach, rows=...)
.
Hyper-parameters
features
: one of the following, with the behavior indicated:[]
(empty, the default): filter out all features (columns) which were not encountered in training- non-empty vector of feature names (symbols): keep only the specified features (
ignore=false
) or keep only unspecified features (ignore=true
) - function or other callable: keep a feature if the callable returns
true
on its name. For example, specifyingFeatureSelector(features = name -> name in [:x1, :x3], ignore = true)
has the same effect asFeatureSelector(features = [:x1, :x3], ignore = true)
, namely to select all features, with the exception of:x1
and:x3
.
ignore
: whether to ignore or keep specifiedfeatures
, as explained above
Operations
transform(mach, Xnew)
: select features from the tableXnew
as specified by the model, taking features seen during training into account, if relevant
Fitted parameters
The fields of fitted_params(mach)
are:
features_to_keep
: the features that will be selected
Example
using MLJ
X = (ordinal1 = [1, 2, 3],
ordinal2 = coerce(["x", "y", "x"], OrderedFactor),
ordinal3 = [10.0, 20.0, 30.0],
ordinal4 = [-20.0, -30.0, -40.0],
nominal = coerce(["Your father", "he", "is"], Multiclass));
selector = FeatureSelector(features=[:ordinal3, ], ignore=true);
julia> transform(fit!(machine(selector, X)), X)
(ordinal1 = [1, 2, 3],
ordinal2 = CategoricalValue{Symbol,UInt32}["x", "y", "x"],
ordinal4 = [-20.0, -30.0, -40.0],
nominal = CategoricalValue{String,UInt32}["Your father", "he", "is"],)