Reference
MLJModelInterface.UnivariateFiniteMLJModelInterface._model_cleanerMLJModelInterface._model_constructorMLJModelInterface._process_model_defMLJModelInterface._unpack!MLJModelInterface.classesMLJModelInterface.decoderMLJModelInterface.doc_headerMLJModelInterface.evaluateMLJModelInterface.feature_importancesMLJModelInterface.fitMLJModelInterface.fitted_paramsMLJModelInterface.flat_paramsMLJModelInterface.intMLJModelInterface.inverse_transformMLJModelInterface.is_same_exceptMLJModelInterface.isrepresentedMLJModelInterface.istableMLJModelInterface.matrixMLJModelInterface.metadata_modelMLJModelInterface.metadata_pkgMLJModelInterface.nrowsMLJModelInterface.paramsMLJModelInterface.predictMLJModelInterface.predict_jointMLJModelInterface.predict_meanMLJModelInterface.predict_medianMLJModelInterface.predict_modeMLJModelInterface.reformatMLJModelInterface.reportMLJModelInterface.schemaMLJModelInterface.scitypeMLJModelInterface.selectMLJModelInterface.selectcolsMLJModelInterface.selectrowsMLJModelInterface.selectrowsMLJModelInterface.synthesize_docstringMLJModelInterface.tableMLJModelInterface.training_lossesMLJModelInterface.transformMLJModelInterface.updateStatisticalTraits.deep_propertiesMLJModelInterface.@mlj_model
MLJModelInterface.UnivariateFinite — FunctionUnivariateFinite(
    support,
    probs;
    pool=nothing,
    augmented=false,
    ordered=false
)Construct a discrete univariate distribution whose finite support is the elements of the vector support, and whose corresponding probabilities are elements of the vector probs. Alternatively, construct an abstract array of UnivariateFinite distributions by choosing probs to be an array of one higher dimension than the array generated.
Here the word "probabilities" is an abuse of terminology as there is no requirement that probabilities actually sum to one, only that they be non-negative. So UnivariateFinite objects actually implement arbitrary non-negative measures over finite sets of labelled points. A UnivariateDistribution will be a bona fide probability measure when constructed using the augment=true option (see below) or when fit to data.
Unless pool is specified, support should have type AbstractVector{<:CategoricalValue} and all elements are assumed to share the same categorical pool, which may be larger than support.
Important. All levels of the common pool have associated probabilities, not just those in the specified support. However, these probabilities are always zero (see example below).
If probs is a matrix, it should have a column for each class in support (or one less, if augment=true). More generally, probs will be an array whose size is of the form (n1, n2, ..., nk, c), where c = length(support) (or one less, if augment=true) and the constructor then returns an array of UnivariateFinite distributions of size (n1, n2, ..., nk).
Examples
julia> v = categorical(["x", "x", "y", "x", "z"])
5-element CategoricalArrays.CategoricalArray{String,1,UInt32}:
 "x"
 "x"
 "y"
 "x"
 "z"
julia> UnivariateFinite(classes(v), [0.2, 0.3, 0.5])
UnivariateFinite{Multiclass{3}}(x=>0.2, y=>0.3, z=>0.5)
julia> d = UnivariateFinite([v[1], v[end]], [0.1, 0.9])
UnivariateFinite{Multiclass{3}}(x=>0.1, z=>0.9)
julia> rand(d, 3)
3-element CategoricalArrays.CategoricalArray{String,1,UInt32}:
 "x"
 "z"
 "x"
julia> levels(d)
3-element Vector{String}:
 "x"
 "y"
 "z"
julia> pdf(d, "y")
0.0
Specifying a pool
Alternatively, support may be a list of raw (non-categorical) elements if pool is:
some
CategoricalArray,CategoricalValueorCategoricalPool, such thatsupportis a subset oflevels(pool)missing, in which case a new categorical pool is created which hassupportas its only levels.
In the last case, specify ordered=true if the pool is to be considered ordered.
julia> UnivariateFinite(["x", "z"], [0.1, 0.9], pool=missing, ordered=true)
UnivariateFinite{OrderedFactor{2}}(x=>0.1, z=>0.9)
julia> d = UnivariateFinite(["x", "z"], [0.1, 0.9], pool=v) # v defined above
UnivariateFinite{Multiclass{3}}(x=>0.1, z=>0.9)
julia> pdf(d, "y") # allowed as `"y" in levels(v)`
0.0
julia> v = categorical(["x", "x", "y", "x", "z", "w"])
6-element CategoricalArrays.CategoricalArray{String,1,UInt32}:
 "x"
 "x"
 "y"
 "x"
 "z"
 "w"
julia> probs = rand(100, 3); probs = probs ./ sum(probs, dims=2);
julia> UnivariateFinite(["x", "y", "z"], probs, pool=v)
100-element UnivariateFiniteVector{Multiclass{4}, String, UInt32, Float64}:
 UnivariateFinite{Multiclass{4}}(x=>0.194, y=>0.3, z=>0.505)
 UnivariateFinite{Multiclass{4}}(x=>0.727, y=>0.234, z=>0.0391)
 UnivariateFinite{Multiclass{4}}(x=>0.674, y=>0.00535, z=>0.321)
 ⋮
 UnivariateFinite{Multiclass{4}}(x=>0.292, y=>0.339, z=>0.369)Probability augmentation
If augment=true the provided array is augmented by inserting appropriate elements ahead of those provided, along the last dimension of the array. This means the user only provides probabilities for the classes c2, c3, ..., cn. The class c1 probabilities are chosen so that each UnivariateFinite distribution in the returned array is a bona fide probability distribution.
UnivariateFinite(prob_given_class; pool=nothing, ordered=false)Construct a discrete univariate distribution whose finite support is the set of keys of the provided dictionary, prob_given_class, and whose values specify the corresponding probabilities.
The type requirements on the keys of the dictionary are the same as the elements of support given above with this exception: if non-categorical elements (raw labels) are used as keys, then pool=... must be specified and cannot be missing.
If the values (probabilities) are arrays instead of scalars, then an abstract array of UnivariateFinite elements is created, with the same size as the array.
MLJModelInterface.classes — Methodclasses(x)All the categorical elements with the same pool as x (including x), returned as a list, with an ordering consistent with the pool. Here x has CategoricalValue type, and classes(x) is a vector of the same eltype. Note that x in classes(x) is always true.
Not to be confused with levels(x.pool). See the example below.
julia> v = categorical(["c", "b", "c", "a"])
4-element CategoricalArrays.CategoricalArray{String,1,UInt32}:
 "c"
 "b"
 "c"
 "a"
julia> levels(v)
3-element Vector{String}:
 "a"
 "b"
 "c"
julia> x = v[4]
CategoricalArrays.CategoricalValue{String, UInt32} "a"
julia> classes(x)
3-element CategoricalArrays.CategoricalArray{String,1,UInt32}:
 "a"
 "b"
 "c"
julia> levels(x.pool)
3-element Vector{String}:
 "a"
 "b"
 "c"MLJModelInterface.decoder — Methoddecoder(x)Return a callable object for decoding the integer representation of a CategoricalValue sharing the same pool the CategoricalValue x. Specifically, one has decoder(x)(int(y)) == y for all CategoricalValues y having the same pool as x. One can also call decoder(x) on integer arrays, in which case decoder(x) is broadcast over all elements.
Examples
julia> v = categorical(["c", "b", "c", "a"])
4-element CategoricalArrays.CategoricalArray{String,1,UInt32}:
 "c"
 "b"
 "c"
 "a"
julia> int(v)
4-element Vector{UInt32}:
 0x00000003
 0x00000002
 0x00000003
 0x00000001
julia> d = decoder(v[3]);
julia> d(int(v)) == v
trueWarning:
It is not true that int(d(u)) == u always holds.
See also: int.
MLJModelInterface.evaluate — Functionsome meta-models may choose to implement the evaluate operations
MLJModelInterface.fit — FunctionMLJModelInterface.fit(model, verbosity, data...) -> fitresult, cache, reportAll models must implement a fit method. Here data is the output of reformat on user-provided data, or some some resampling thereof. The fallback of reformat returns the user-provided data (eg, a table).
MLJModelInterface.fitted_params — Methodfitted_params(model, fitresult) -> human_readable_fitresult # named_tupleModels may overload fitted_params. The fallback returns (fitresult=fitresult,).
Other training-related outcomes should be returned in the report part of the tuple returned by fit.
MLJModelInterface.int — Methodint(x)The positional integer of the CategoricalString or CategoricalValue x, in the ordering defined by the pool of x. The type of int(x) is the reference type of x.
Not to be confused with x.ref, which is unchanged by reordering of the pool of x, but has the same type.
int(X::CategoricalArray)
int(W::Array{<:CategoricalString})
int(W::Array{<:CategoricalValue})Broadcasted versions of int.
julia> v = categorical(["c", "b", "c", "a"])
4-element CategoricalArrays.CategoricalArray{String,1,UInt32}:
 "c"
 "b"
 "c"
 "a"
julia> levels(v)
3-element Vector{String}:
 "a"
 "b"
 "c"
julia> int(v)
4-element Vector{UInt32}:
 0x00000003
 0x00000002
 0x00000003
 0x00000001See also: decoder.
MLJModelInterface.inverse_transform — FunctionUnsupervised models may implement the inverse_transform operation.
MLJModelInterface.is_same_except — Methodis_same_except(m1, m2, exceptions::Symbol...; deep_properties=Symbol[])If both m1 and m2 are of MLJType, return true if the following conditions all hold, and false otherwise:
typeof(m1) === typeof(m2)propertynames(m1) === propertynames(m2)with the exception of properties listed as
exceptionsor bound to anAbstractRNG, each pair of corresponding property values is either "equal" or both undefined. (If a property appears as apropertynamebut not afieldname, it is deemed as always defined.)
The meaining of "equal" depends on the type of the property value:
values that are themselves of
MLJTypeare "equal" if they are equal in the sense ofis_same_exceptwith no exceptions.values that are not of
MLJTypeare "equal" if they are==.
In the special case of a "deep" property, "equal" has a different meaning; see deep_properties) for details.
If m1 or m2 are not MLJType objects, then return ==(m1, m2).
MLJModelInterface.isrepresented — Methodisrepresented(object::MLJType, objects)Test if object has a representative in the iterable objects. This is a weaker requirement than object in objects.
Here we say m1 represents m2 if is_same_except(m1, m2) is true.
MLJModelInterface.matrix — Methodmatrix(X; transpose=false)If X isa AbstractMatrix, return X or permutedims(X) if transpose=true. Otherwise if X is a Tables.jl compatible table source, convert X into a Matrix.
MLJModelInterface.metadata_model — Methodmetadata_model(T; args...)Helper function to write the metadata for a model T.
Keywords
input_scitype=Unknown: allowed scientific type of the input datatarget_scitype=Unknown: allowed scitype of the target (supervised)output_scitype=Unknown: allowed scitype of the transformed data (unsupervised)supports_weights=false: whether the model supports sample weightssupports_class_weights=false: whether the model supports class weightsload_path="unknown": where the model is (usuallyPackageName.ModelName)human_name=nothing: human name of the modelsupports_training_losses=nothing: whether the (necessarily iterative) model can report training lossesreports_feature_importances=nothing: whether the model reports feature importances
Example
metadata_model(KNNRegressor,
    input_scitype=MLJModelInterface.Table(MLJModelInterface.Continuous),
    target_scitype=AbstractVector{MLJModelInterface.Continuous},
    supports_weights=true,
    load_path="NearestNeighbors.KNNRegressor")MLJModelInterface.metadata_pkg — Methodmetadata_pkg(T; args...)Helper function to write the metadata for a package providing model T. Use it with broadcasting to define the metadata of the package providing a series of models.
Keywords
package_name="unknown": package namepackage_uuid="unknown": package uuidpackage_url="unknown": package urlis_pure_julia=missing: whether the package is pure juliapackage_license="unknown": package licenseis_wrapper=false: whether the package is a wrapper
Example
metadata_pkg.((KNNRegressor, KNNClassifier),
    package_name="NearestNeighbors",
    package_uuid="b8a86587-4115-5ab1-83bc-aa920d37bbce",
    package_url="https://github.com/KristofferC/NearestNeighbors.jl",
    is_pure_julia=true,
    package_license="MIT",
    is_wrapper=false)MLJModelInterface.nrows — Methodnrows(X)Return the number of rows for a table, AbstractVector or AbstractMatrix, X.
MLJModelInterface.params — Methodparams(m::MLJType)Recursively convert any transparent object m into a named tuple, keyed on the fields of m. An object is transparent if MLJModelInterface.istransparent(m) == true. The named tuple is possibly nested because params is recursively applied to the field values, which themselves might be transparent.
Most objects of type MLJType are transparent.
julia> params(EnsembleModel(model=ConstantClassifier()))
(model = (target_type = Bool,),
 weights = Float64[],
 bagging_fraction = 0.8,
 rng_seed = 0,
 n = 100,
 parallel = true,)MLJModelInterface.predict — Functionpredict(model, fitresult, new_data...)Supervised and SupervisedAnnotator models must implement the predict operation. Here new_data is the output of reformat called on user-specified data.
MLJModelInterface.predict_joint — FunctionJointProbabilistic supervised models MUST overload predict_joint.
Probabilistic supervised models MAY overload predict_joint.
MLJModelInterface.predict_mean — FunctionModels types M for which prediction_type(M) == :probablisitic may overload predict_mean.
MLJModelInterface.predict_median — FunctionModels types M for which prediction_type(M) == :probablisitic may overload predict_median.
MLJModelInterface.predict_mode — FunctionModels types M for which prediction_type(M) == :probablisitic may overload predict_mode.
MLJModelInterface.reformat — MethodMLJModelInterface.reformat(model, args...) -> dataModels optionally overload reformat to define transformations of user-supplied data into some model-specific representation (e.g., from a table to a matrix). When implemented, the MLJ user can avoid repeating such transformations unnecessarily, and can additionally make use of more efficient row subsampling, which is then based on the model-specific representation of data, rather than the user-representation. When reformat is overloaded, selectrows(::Model, ...) must be as well (see selectrows). Furthermore, the model fit method(s), and operations, such as predict and transform, must be refactored to act on the model-specific representations of the data.
To implement the reformat data front-end for a model, refer to "Implementing a data front-end" in the MLJ manual.
MLJModelInterface.scitype — Methodscitype(X)The scientific type (interpretation) of X, distinct from its machine type.
Examples
julia> scitype(3.14)
Continuous
julia> scitype([1, 2, missing])
AbstractVector{Union{Missing, Count}} 
julia> scitype((5, "beige"))
Tuple{Count, Textual}
julia> using CategoricalArrays
julia> X = (gender = categorical(['M', 'M', 'F', 'M', 'F']),
            ndevices = [1, 3, 2, 3, 2]);
julia> scitype(X)
Table{Union{AbstractVector{Count}, AbstractVector{Multiclass{2}}}}MLJModelInterface.select — Functionselect(X, r, c)Select element(s) of a table or matrix at row(s) r and column(s) c. An object of the sink type of X (or a matrix) is returned unless c is a single integer or symbol. In that case a vector is returned, unless r is a single integer, in which case a single element is returned.
See also: selectrows, selectcols.
MLJModelInterface.selectcols — Functionselectcols(X, c)Select single or multiple columns from a matrix or table X. If c is an abstract vector of integers or symbols, then the object returned is a table of the preferred sink type of typeof(X). If c is a single integer or column, then an AbstractVector is returned.
MLJModelInterface.selectrows — Functionselectrows(X, r)Select single or multiple rows from a table, abstract vector or matrix X. If X is tabular, the object returned is a table of the preferred sink type of typeof(X), even if only a single row is selected.
If the object is neither a table, abstract vector or matrix, X is returned and r is ignored.
MLJModelInterface.selectrows — MethodMLJModelInterface.selectrows(::Model, I, data...) -> sampled_dataA model overloads selectrows whenever it buys into the optional reformat front-end for data preprocessing. See reformat for details. The fallback assumes data is a tuple and calls selectrows(X, I) for each X in data, returning the results in a new tuple of the same length. This call makes sense when X is a table, abstract vector or abstract matrix. In the last two cases, a new object and not a view is returned.
MLJModelInterface.table — Methodtable(columntable; prototype=nothing)Convert a named tuple of vectors or tuples columntable, into a table of the "preferred sink type" of prototype. This is often the type of prototype itself, when prototype is a sink; see the Tables.jl documentation. If prototype is not specified, then a named tuple of vectors is returned.
table(A::AbstractMatrix; names=nothing, prototype=nothing)Wrap an abstract matrix A as a Tables.jl compatible table with the specified column names (a tuple of symbols). If names are not specified, names=(:x1, :x2, ..., :xn) is used, where n=size(A, 2).
If a prototype is specified, then the matrix is materialized as a table of the preferred sink type of prototype, rather than wrapped. Note that if prototype is not specified, then matrix(table(A)) is essentially a no-op.
MLJModelInterface.training_losses — MethodMLJModelInterface.training_losses(model::M, report)If M is an iterative model type which calculates training losses, implement this method to return an AbstractVector of the losses in historical order. If the model calculates scores instead, then the sign of the scores should be reversed.
The following trait overload is also required: MLJModelInterface.supports_training_losses(::Type{<:M}) = true.
MLJModelInterface.transform — FunctionUnsupervised models must implement the transform operation.
MLJModelInterface.update — MethodMLJModelInterface.update(model, verbosity, fitresult, cache, data...)Models may optionally implement an update method. The fallback calls fit.
StatisticalTraits.deep_properties — Functiondeep_properties(::Type{<:MLJType})Given an MLJType subtype M, the value of this trait should be a tuple of any properties of M to be regarded as "deep".
When two instances of type M are to be tested for equality, in the sense of == or is_same_except, then the values of a "deep" property (whose values are assumed to be of composite type) are deemed to agree if all corresponding properties of those property values are ==.
Any property of M whose values are themselves of MLJType are "deep" automatically, and should not be included in the trait return value.
See also is_same_except
Example
Consider an MLJType subtype Foo, with a single field of type Bar which is not a subtype of MLJType:
mutable struct Bar
    x::Int
end
mutable struct Foo <: MLJType
    bar::Bar
endThen the mutability of Foo implies Foo(1) != Foo(1) and so, by the definition == for MLJType objects (see is_same_except) we have
Bar(Foo(1)) != Bar(Foo(1))However after the declaration
MLJModelInterface.deep_properties(::Type{<:Foo}) = (:bar,)We have
Bar(Foo(1)) == Bar(Foo(1))MLJModelInterface.@mlj_model — Macro@mlj_modelMacro to help define MLJ models with constraints on the default parameters.
MLJModelInterface._model_cleaner — Method_model_cleaner(modelname, defaults, constraints)Build the expression of the cleaner associated with the constraints specified in a model def.
MLJModelInterface._model_constructor — Method_model_constructor(modelname, params, defaults)Build the expression of the keyword constructor associated with a model definition. When the constructor is called, the clean! function is called as well to check that parameter assignments are valid.
MLJModelInterface._process_model_def — Method_process_model_def(modl, ex)Take an expression defining a model (mutable struct Model ...) and unpack key elements for further processing:
- Model name (
modelname) - Names of parameters (
params) - Default values (
defaults) - Constraints (
constraints) 
When no default field value is given a heuristic is to guess an appropriate default (eg, zero for a Float64 parameter). To this end, the specified type expression is evaluated in the module modl.
MLJModelInterface._unpack! — Method_unpack!(ex, rep)Internal function to allow to read a constraint given after a default value for a parameter and transform it in an executable condition (which is returned to be executed later). For instance if we have
alpha::Int = 0.5::(arg > 0.0)Then it would transform the (arg > 0.0) in (alpha > 0.0) which is executable.
MLJModelInterface.doc_header — MethodMLJModelInterface.doc_header(SomeModelType; augment=false)Return a string suitable for interpolation in the document string of an MLJ model type. In the example given below, the header expands to something like this:
FooRegressorA model type for constructing a foo regressor, based on FooRegressorPkg.jl.
From MLJ, the type can be imported using
FooRegressor = @load FooRegressor pkg=FooRegressorPkgConstruct an instance with default hyper-parameters using the syntax
model = FooRegressor(). Provide keyword arguments to override hyper-parameter defaults, as inFooRegressor(a=...).
Ordinarily, doc_header is used in document strings defined after the model type definition, as doc_header assumes model traits (in particular, package_name and package_url) to be defined; see also MLJModelInterface.metadata_pkg.
Example
Suppose a model type and traits have been defined by:
mutable struct FooRegressor
    a::Int
    b::Float64
end
metadata_pkg(FooRegressor,
    name="FooRegressorPkg",
    uuid="10745b16-79ce-11e8-11f9-7d13ad32a3b2",
    url="http://existentialcomics.com/",
    )
metadata_model(FooRegressor,
    input=Table(Continuous),
    target=AbstractVector{Continuous})Then the docstring is defined after these declarations with the following code:
"""
$(MLJModelInterface.doc_header(FooRegressor))
### Training data
In MLJ or MLJBase, bind an instance `model` ...
<rest of doc string goes here>
"""
FooRegressor
Variation to augment existing document string
For models that have a native API with separate documentation, one may want to call doc_header(FooRegressor, augment=true) instead. In that case, the output will look like this:
From MLJ, the
FooRegressortype can be imported using
FooRegressor = @load FooRegressor pkg=FooRegressorPkgConstruct an instance with default hyper-parameters using the syntax
model = FooRegressor(). Provide keyword arguments to override hyper-parameter defaults, as inFooRegressor(a=...).
MLJModelInterface.feature_importances — Functionfeature_importances(model::M, fitresult, report)For a given model of model type M supporting intrinsic feature importances, calculate the feature importances from the model's fitresult and report as an abstract vector of feature::Symbol => importance::Real pairs (e.g [:gender =>0.23, :height =>0.7, :weight => 0.1]).
New model implementations
The following trait overload is also required: MLJModelInterface.reports_feature_importances(::Type{<:M}) = true
If for some reason a model is sometimes unable to report feature importances then feature_importances should return all importances as 0.0, as in [:gender =>0.0, :height =>0.0, :weight => 0.0].
MLJModelInterface.flat_params — Methodflat_params(m::Model)Deconstruct any Model instance model as a flat named tuple, keyed on property names. Properties of nested model instances are recursively exposed,.as shown in the example below.  For most Model objects, properties are synonymous with fields, but this is not a hard requirement.
julia> using MLJModels
julia> using EnsembleModels
julia> tree = (@load DecisionTreeClassifier pkg=DecisionTree)();
julia> flat_params(EnsembleModel(model=tree))
(model__max_depth = -1,
 model__min_samples_leaf = 1,
 model__min_samples_split = 2,
 model__min_purity_increase = 0.0,
 model__n_subfeatures = 0,
 model__post_prune = false,
 model__merge_purity_threshold = 1.0,
 model__display_depth = 5,
 model__feature_importance = :impurity,
 model__rng = Random._GLOBAL_RNG(),
 atomic_weights = Float64[],
 bagging_fraction = 0.8,
 rng = Random._GLOBAL_RNG(),
 n = 100,
 acceleration = CPU1{Nothing}(nothing),
 out_of_bag_measure = Any[],)MLJModelInterface.istable — Methodistable(X)Return true if X is tabular.
MLJModelInterface.report — MethodMLJModelInterface.report(model, report_given_method)Merge the reports in the dictionary report_given_method into a single property-accessible object. It is supposed that each key of the dictionary is either :fit or the name of an operation, such as :predict or :transform. Each value will be the report component returned by a training method (fit or update) dispatched on the model type, in the case of :fit, or the report component returned by an operation that supports reporting.
New model implementations
Overloading this method is optional, unless the model generates reports that are neither named tuples nor nothing.
Assuming each value in the report_given_method dictionary is either a named tuple or nothing, and there are no conflicts between the keys of the dictionary values (the individual reports), the fallback returns the usual named tuple merge of the dictionary values, ignoring any nothing value. If there is a key conflict, all operation reports are first wrapped in a named tuple of length one, as in (predict=predict_report,). A :fit report is never wrapped.
If any dictionary value is neither a named tuple nor nothing, it is first wrapped as (report=value, ) before merging.
MLJModelInterface.schema — Methodschema(X)Inspect the column types and scitypes of a tabular object. returns nothing if the column types and scitypes can't be inspected.
MLJModelInterface.synthesize_docstring — Methodsynthesize_docstringPrivate method.
Generates a value for the docstring trait for use with a model which does not have a standard document string, to use as the fallback. See metadata_model.