Model Search
In addition to perusing the Model Browser, one can programatically search MLJ's Model Registry, without actually loading all the packages providing model code. This allows you to efficiently find all models solving a given machine learning task. The task itself is specified with the help of the matching
method, and the search executed with the models
methods, as detailed below.
For commonly encountered problems with model search, see also Preparing Data.
A table of all models is also given at List of Supported Models.
Model metadata
Terminology. In this section the word "model" refers to a metadata entry in the model registry, as opposed to an actual model struct
that such an entry represents. One can obtain such an entry with the info
command:
julia> info("PCA")
(name = "PCA", package_name = "MultivariateStats", is_supervised = false, abstract_type = Unsupervised, constructor = nothing, deep_properties = (), docstring = "```\nPCA\n```\n\nA model type for constructing a pca, ...", fit_data_scitype = Tuple{Table{<:AbstractVector{<:Continuous}}}, human_name = "pca", hyperparameter_ranges = (nothing, nothing, nothing, nothing), hyperparameter_types = ("Int64", "Symbol", "Float64", "Union{Nothing, Real, Vector{Float64}}"), hyperparameters = (:maxoutdim, :method, :variance_ratio, :mean), implemented_methods = [:clean!, :fit, :fitted_params, :inverse_transform, :transform], inverse_transform_scitype = Table{<:AbstractVector{<:Continuous}}, is_pure_julia = true, is_wrapper = false, iteration_parameter = nothing, load_path = "MLJMultivariateStatsInterface.PCA", package_license = "MIT", package_url = "https://github.com/JuliaStats/MultivariateStats.jl", package_uuid = "6f286f6a-111f-5878-ab1e-185364afe411", predict_scitype = Unknown, prediction_type = :unknown, reporting_operations = (), reports_feature_importances = false, supports_class_weights = false, supports_online = false, supports_training_losses = false, supports_weights = false, target_in_fit = false, transform_scitype = Table{<:AbstractVector{<:Continuous}}, input_scitype = Table{<:AbstractVector{<:Continuous}}, target_scitype = Unknown, output_scitype = Table{<:AbstractVector{<:Continuous}})
So a "model" in the present context is just a named tuple containing metadata, and not an actual model type or instance. If two models with the same name occur in different packages, the package name must be specified, as in info("LinearRegressor", pkg="GLM")
.
Model document strings can be retreived, without importing the defining code, using the doc
function:
doc("DecisionTreeClassifier", pkg="DecisionTree")
General model queries
We list all models (named tuples) using models()
, and list the models for which code is already loaded with localmodels()
:
julia> localmodels()
60-element Vector{NamedTuple{(:name, :package_name, :is_supervised, :abstract_type, :constructor, :deep_properties, :docstring, :fit_data_scitype, :human_name, :hyperparameter_ranges, :hyperparameter_types, :hyperparameters, :implemented_methods, :inverse_transform_scitype, :is_pure_julia, :is_wrapper, :iteration_parameter, :load_path, :package_license, :package_url, :package_uuid, :predict_scitype, :prediction_type, :reporting_operations, :reports_feature_importances, :supports_class_weights, :supports_online, :supports_training_losses, :supports_weights, :target_in_fit, :transform_scitype, :input_scitype, :target_scitype, :output_scitype)}}: (name = AdaBoostStumpClassifier, package_name = DecisionTree, ... ) (name = AffinityPropagation, package_name = Clustering, ... ) (name = BayesianLDA, package_name = MultivariateStats, ... ) (name = BayesianSubspaceLDA, package_name = MultivariateStats, ... ) (name = ConstantClassifier, package_name = MLJModels, ... ) (name = ConstantRegressor, package_name = MLJModels, ... ) (name = ContinuousEncoder, package_name = MLJModels, ... ) (name = DBSCAN, package_name = Clustering, ... ) (name = DecisionTreeClassifier, package_name = DecisionTree, ... ) (name = DecisionTreeRegressor, package_name = DecisionTree, ... ) ⋮ (name = RidgeRegressor, package_name = MultivariateStats, ... ) (name = RobustRegressor, package_name = MLJLinearModels, ... ) (name = Standardizer, package_name = MLJModels, ... ) (name = SubspaceLDA, package_name = MultivariateStats, ... ) (name = UnivariateBoxCoxTransformer, package_name = MLJModels, ... ) (name = UnivariateDiscretizer, package_name = MLJModels, ... ) (name = UnivariateFillImputer, package_name = MLJModels, ... ) (name = UnivariateStandardizer, package_name = MLJModels, ... ) (name = UnivariateTimeTypeToContinuous, package_name = MLJModels, ... )
julia> localmodels()[2]
(name = "AffinityPropagation", package_name = "Clustering", is_supervised = false, abstract_type = Static, constructor = nothing, deep_properties = (), docstring = "```\nAffinityPropagation\n```\n\nA model type for cons...", fit_data_scitype = Tuple{}, human_name = "Affinity Propagation clusterer", hyperparameter_ranges = (nothing, nothing, nothing, nothing, nothing), hyperparameter_types = ("Float64", "Int64", "Float64", "Union{Nothing, Float64}", "Distances.SemiMetric"), hyperparameters = (:damp, :maxiter, :tol, :preference, :metric), implemented_methods = [:clean!, :predict], inverse_transform_scitype = Tuple{Table{<:AbstractVector{<:Continuous}}}, is_pure_julia = true, is_wrapper = false, iteration_parameter = nothing, load_path = "MLJClusteringInterface.AffinityPropagation", package_license = "MIT", package_url = "https://github.com/JuliaStats/Clustering.jl", package_uuid = "aaaa29a8-35af-508c-8bc3-b662a17a0fe5", predict_scitype = Unknown, prediction_type = :unknown, reporting_operations = (:predict,), reports_feature_importances = false, supports_class_weights = false, supports_online = false, supports_training_losses = false, supports_weights = false, target_in_fit = false, transform_scitype = Unknown, input_scitype = Tuple{Table{<:AbstractVector{<:Continuous}}}, target_scitype = Unknown, output_scitype = Unknown)
One can search for models containing specified strings or regular expressions in their docstring
attributes, as in
julia> models("forest")
12-element Vector{NamedTuple{(:name, :package_name, :is_supervised, :abstract_type, :constructor, :deep_properties, :docstring, :fit_data_scitype, :human_name, :hyperparameter_ranges, :hyperparameter_types, :hyperparameters, :implemented_methods, :inverse_transform_scitype, :is_pure_julia, :is_wrapper, :iteration_parameter, :load_path, :package_license, :package_url, :package_uuid, :predict_scitype, :prediction_type, :reporting_operations, :reports_feature_importances, :supports_class_weights, :supports_online, :supports_training_losses, :supports_weights, :target_in_fit, :transform_scitype, :input_scitype, :target_scitype, :output_scitype)}}: (name = GeneralImputer, package_name = BetaML, ... ) (name = IForestDetector, package_name = OutlierDetectionPython, ... ) (name = RandomForestClassifier, package_name = DecisionTree, ... ) (name = RandomForestClassifier, package_name = MLJScikitLearnInterface, ... ) (name = RandomForestImputer, package_name = BetaML, ... ) (name = RandomForestRegressor, package_name = BetaML, ... ) (name = RandomForestRegressor, package_name = DecisionTree, ... ) (name = RandomForestRegressor, package_name = MLJScikitLearnInterface, ... ) (name = StableForestClassifier, package_name = SIRUS, ... ) (name = StableForestRegressor, package_name = SIRUS, ... ) (name = StableRulesClassifier, package_name = SIRUS, ... ) (name = StableRulesRegressor, package_name = SIRUS, ... )
or by specifying a filter (Bool
-valued function):
julia> filter(model) = model.is_supervised && model.input_scitype >: MLJ.Table(Continuous) && model.target_scitype >: AbstractVector{<:Multiclass{3}} && model.prediction_type == :deterministic
filter (generic function with 1 method)
julia> models(filter)
13-element Vector{NamedTuple{(:name, :package_name, :is_supervised, :abstract_type, :constructor, :deep_properties, :docstring, :fit_data_scitype, :human_name, :hyperparameter_ranges, :hyperparameter_types, :hyperparameters, :implemented_methods, :inverse_transform_scitype, :is_pure_julia, :is_wrapper, :iteration_parameter, :load_path, :package_license, :package_url, :package_uuid, :predict_scitype, :prediction_type, :reporting_operations, :reports_feature_importances, :supports_class_weights, :supports_online, :supports_training_losses, :supports_weights, :target_in_fit, :transform_scitype, :input_scitype, :target_scitype, :output_scitype)}}: (name = DeterministicConstantClassifier, package_name = MLJModels, ... ) (name = LinearSVC, package_name = LIBSVM, ... ) (name = NuSVC, package_name = LIBSVM, ... ) (name = PassiveAggressiveClassifier, package_name = MLJScikitLearnInterface, ... ) (name = PerceptronClassifier, package_name = MLJScikitLearnInterface, ... ) (name = RidgeCVClassifier, package_name = MLJScikitLearnInterface, ... ) (name = RidgeClassifier, package_name = MLJScikitLearnInterface, ... ) (name = SGDClassifier, package_name = MLJScikitLearnInterface, ... ) (name = SRRegressor, package_name = SymbolicRegression, ... ) (name = SVC, package_name = LIBSVM, ... ) (name = SVMClassifier, package_name = MLJScikitLearnInterface, ... ) (name = SVMLinearClassifier, package_name = MLJScikitLearnInterface, ... ) (name = SVMNuClassifier, package_name = MLJScikitLearnInterface, ... )
Multiple test arguments may be passed to models
, which are applied conjunctively.
Matching models to data
Common searches are streamlined with the help of the matching
command, defined as follows:
matching(model, X, y) == true
exactly whenmodel
is supervised and admits inputs and targets with the scientific types ofX
andy
, respectivelymatching(model, X) == true
exactly whenmodel
is unsupervised and admits inputs with the scientific types ofX
.
So, to search for all supervised probabilistic models handling input X
and target y
, one can define the testing function task
by
task(model) = matching(model, X, y) && model.prediction_type == :probabilistic
And execute the search with
models(task)
Also defined are Bool
-valued callable objects matching(model)
, matching(X, y)
and matching(X)
, with obvious behavior. For example, matching(X, y)(model) = matching(model, X, y)
.
So, to search for all models compatible with input X
and target y
, for example, one executes
models(matching(X, y))
while the preceding search can also be written
models() do model
matching(model, X, y) &&
model.prediction_type == :probabilistic
end
API
MLJModels.models
— Functionmodels(; wrappers=false)
List all models in the MLJ registry. Here and below model means the registry metadata entry for a genuine model type (a proxy for types whose defining code may not be loaded). To include wrappers and other composite models, such as TunedModel
and Stack
, specify wrappers=true
.
models(filters...; wrappers=false)
List all models m
for which filter(m)
is true, for each filter
in filters
.
models(matching(X, y); wrappers=false)
List all supervised models compatible with training data X
, y
.
models(matching(X); wrappers=false)
List all unsupervised models compatible with training data X
.
Example
If
task(model) = model.is_supervised && model.is_probabilistic
then models(task)
lists all supervised models making probabilistic predictions.
See also: localmodels
.
models(needle::Union{AbstractString,Regex}; wrappers=false)
List all models whole name
or docstring
matches a given needle
.
MLJModels.localmodels
— Functionlocalmodels(; modl=Main, wrappers=false)
localmodels(filters...; modl=Main, wrappers=false)
localmodels(needle::Union{AbstractString,Regex}; modl=Main, wrappers=false)
List all models currently available to the user from the module modl
without importing a package, and which additional pass through the specified filters. Here a filter is a Bool
-valued function on models.
Use load_path
to get the path to some model returned, as in these examples:
ms = localmodels()
model = ms[1]
load_path(model)