Model Search
MLJ has a model registry, allowing the user to search models and their properties, without loading all the packages containing model code. In turn, this allows one to efficiently find all models solving a given machine learning task. The task itself is specified with the help of the matching
method, and the search executed with the models
methods, as detailed below.
Model metadata
Terminology. In this section the word "model" refers to the metadata entry in the registry of an actual model struct
, as appearing elsewhere in the manual. One can obtain such an entry with the info
command:
julia> info("PCA")
Principal component analysis. Learns a linear transformation to project the data on a lower dimensional space while preserving most of the initial variance.
→ based on [MultivariateStats](https://github.com/JuliaStats/MultivariateStats.jl).
→ do `@load PCA pkg="MultivariateStats"` to use the model.
→ do `?PCA` for documentation.
(name = "PCA",
package_name = "MultivariateStats",
is_supervised = false,
docstring = "Principal component analysis. Learns a linear transformation to project the data on a lower dimensional space while preserving most of the initial variance.\n→ based on [MultivariateStats](https://github.com/JuliaStats/MultivariateStats.jl).\n→ do `@load PCA pkg=\"MultivariateStats\"` to use the model.\n→ do `?PCA` for documentation.",
hyperparameter_types = ["Union{Nothing, Int64}", "Symbol", "Float64", "Union{Nothing, Array{Float64,1}, Real}"],
hyperparameters = Symbol[:maxoutdim, :method, :pratio, :mean],
implemented_methods = Symbol[:fit, :fitted_params, :transform],
is_pure_julia = true,
is_wrapper = false,
load_path = "MLJModels.MultivariateStats_.PCA",
package_license = "MIT",
package_url = "https://github.com/JuliaStats/MultivariateStats.jl",
package_uuid = "6f286f6a-111f-5878-ab1e-185364afe411",
supports_online = false,
input_scitype = Table{_s13} where _s13<:(AbstractArray{_s12,1} where _s12<:Continuous),
output_scitype = Unknown,)
If two models with the same name occur in different packages, the package name must be specified, as in info("LinearRegressor", pkg="GLM")
.
General model queries
We list all models with models()
, and list the models for which code is already loaded with localmodels()
:
julia> localmodels()
11-element Array{NamedTuple,1}:
(name = ConstantClassifier, package_name = MLJModels, ... )
(name = ConstantRegressor, package_name = MLJModels, ... )
(name = DeterministicConstantClassifier, package_name = MLJModels, ... )
(name = DeterministicConstantRegressor, package_name = MLJModels, ... )
(name = FeatureSelector, package_name = MLJModels, ... )
(name = FillImputer, package_name = MLJModels, ... )
(name = OneHotEncoder, package_name = MLJModels, ... )
(name = Standardizer, package_name = MLJModels, ... )
(name = UnivariateBoxCoxTransformer, package_name = MLJModels, ... )
(name = UnivariateDiscretizer, package_name = MLJModels, ... )
(name = UnivariateStandardizer, package_name = MLJModels, ... )
julia> localmodels()[2]
Constant regressor (Probabilistic).
→ based on [MLJModels](https://github.com/alan-turing-institute/MLJModels.jl).
→ do `@load ConstantRegressor pkg="MLJModels"` to use the model.
→ do `?ConstantRegressor` for documentation.
(name = "ConstantRegressor",
package_name = "MLJModels",
is_supervised = true,
docstring = "Constant regressor (Probabilistic).\n→ based on [MLJModels](https://github.com/alan-turing-institute/MLJModels.jl).\n→ do `@load ConstantRegressor pkg=\"MLJModels\"` to use the model.\n→ do `?ConstantRegressor` for documentation.",
hyperparameter_types = ["Type{D} where D"],
hyperparameters = Symbol[:distribution_type],
implemented_methods = Symbol[:fit, :predict, :fitted_params],
is_pure_julia = true,
is_wrapper = false,
load_path = "MLJModels.ConstantRegressor",
package_license = "MIT",
package_url = "https://github.com/alan-turing-institute/MLJModels.jl",
package_uuid = "d491faf4-2d78-11e9-2867-c94bc002c0b7",
prediction_type = :probabilistic,
supports_online = false,
supports_weights = false,
input_scitype = Table{_s13} where _s13<:(AbstractArray{_s12,1} where _s12<:Union{Missing, Found}),
target_scitype = AbstractArray{Continuous,1},)
If models
is passed any Bool
-valued function test
, it returns every model
for which test(model)
is true, as in
julia> test(model) = model.is_supervised &&
MLJ.Table(Continuous) <: model.input_scitype &&
AbstractVector{<:Multiclass{3}} <: model.target_scitype &&
model.prediction_type == :deterministic
test (generic function with 1 method)
julia> models(test)
13-element Array{NamedTuple,1}:
(name = DeterministicConstantClassifier, package_name = MLJModels, ... )
(name = LinearSVC, package_name = LIBSVM, ... )
(name = NuSVC, package_name = LIBSVM, ... )
(name = PassiveAggressiveClassifier, package_name = ScikitLearn, ... )
(name = PerceptronClassifier, package_name = ScikitLearn, ... )
(name = RandomForestRegressor, package_name = ScikitLearn, ... )
(name = RidgeCVClassifier, package_name = ScikitLearn, ... )
(name = RidgeClassifier, package_name = ScikitLearn, ... )
(name = SGDClassifier, package_name = ScikitLearn, ... )
(name = SVC, package_name = LIBSVM, ... )
(name = SVMClassifier, package_name = ScikitLearn, ... )
(name = SVMLClassifier, package_name = ScikitLearn, ... )
(name = SVMNuClassifier, package_name = ScikitLearn, ... )
Multiple test arguments may be passed to models
, which are applied conjunctively.
Matching models to data
The matching
method described below is experimental and may break in subsequent MLJ releases.
Common searches are streamlined with the help of the matching
command, defined as follows:
matching(model, X, y) == true
exactly whenmodel
is supervised and admits inputs and targets with the scientific types ofX
andy
, respectivelymatching(model, X) == true
exactly whenmodel
is unsupervised and admits inputs with the scientific types ofX
.
So, to search for all supervised probablistic models handling input X
and target y
, one can define the testing function task
by
task(model) = matching(model, X, y)) && model.is_probabilistic
And execute the search with
models(task)
Also defined are Bool
-valued callable objects matching(model)
, matching(X, y)
and matching(X)
, with obvious behaviour. For example, matching(X, y)(model) = matching(model, X, y)
.
So, to search for all models compatible with input X
and target y
, for example, one executes
models(matching(X, y))
while the preceding search can also be written
models() do model
matching(model, X, y) &&
model.prediction_type == :probabilistic
end
API
MLJBase.models
— Function.models()
List all models in the MLJ registry. Here and below model means the registry metadata entry for a genuine model type (a proxy for types whose defining code may not be loaded).
models(conditions...)
List all models satisifying the specified conditions
. A condition is any Bool
-valued function on models.
Excluded in the listings are the built-in model-wraps EnsembleModel
, TunedModel
, and IteratedModel
.
Example
If
task(model) = model.is_supervised && model.is_probabilistic
then models(task)
lists all supervised models making probabilistic predictions.
See also: localmodels
.
models(N::AbstractNode)
A vector of all models referenced by a node N
, each model appearing exactly once.
MLJModels.localmodels
— Function.localmodels(; modl=Main)
localmodels(conditions...; modl=Main)
List all models whose names are in the namespace of the specified module modl
, or meeting the conditions
, if specified. Here a condition is a Bool
-valued function on models.
See also models
MLJ.matching
— Function.matching(model, X, y)
Returns true
exactly when the registry metadata entry model
is supervised and admits inputs and targets with the scientific types of X
and y
, respectively.
matching(model, X)
Returns true
exactly when model
is unsupervised and admits inputs with the scientific types of X
.
matching(model), matching(X, y), matching(X)
Curried versions of the preceding methods, i.e., Bool
-valued callable objects satisfying matching(X, y)(model) = matching(model, X, y)
, etc.
Example
models(matching(X))
Finds all unsupervised models compatible with input data X
.
models() do model
matching(model, X, y) && model.prediction_type == :probabilistic
end
Finds all supervised models compatible with input data X
and target data y
and making probabilistic predictions.
See also models