Reference

MLJModelInterface.UnivariateFiniteFunction
UnivariateFinite(
    support,
    probs;
    pool=nothing,
    augmented=false,
    ordered=false
)

Construct a discrete univariate distribution whose finite support is the elements of the vector support, and whose corresponding probabilities are elements of the vector probs. Alternatively, construct an abstract array of UnivariateFinite distributions by choosing probs to be an array of one higher dimension than the array generated.

Here the word "probabilities" is an abuse of terminology as there is no requirement that probabilities actually sum to one, only that they be non-negative. So UnivariateFinite objects actually implement arbitrary non-negative measures over finite sets of labelled points. A UnivariateDistribution will be a bona fide probability measure when constructed using the augment=true option (see below) or when fit to data.

Unless pool is specified, support should have type AbstractVector{<:CategoricalValue} and all elements are assumed to share the same categorical pool, which may be larger than support.

Important. All levels of the common pool have associated probabilities, not just those in the specified support. However, these probabilities are always zero (see example below).

If probs is a matrix, it should have a column for each class in support (or one less, if augment=true). More generally, probs will be an array whose size is of the form (n1, n2, ..., nk, c), where c = length(support) (or one less, if augment=true) and the constructor then returns an array of UnivariateFinite distributions of size (n1, n2, ..., nk).

Examples

julia> v = categorical(["x", "x", "y", "x", "z"])
5-element CategoricalArrays.CategoricalArray{String,1,UInt32}:
 "x"
 "x"
 "y"
 "x"
 "z"

julia> UnivariateFinite(classes(v), [0.2, 0.3, 0.5])
UnivariateFinite{Multiclass{3}}(x=>0.2, y=>0.3, z=>0.5)

julia> d = UnivariateFinite([v[1], v[end]], [0.1, 0.9])
UnivariateFinite{Multiclass{3}}(x=>0.1, z=>0.9)

julia> rand(d, 3)
3-element CategoricalArrays.CategoricalArray{String,1,UInt32}:
 "x"
 "z"
 "x"

julia> levels(d)
3-element Vector{String}:
 "x"
 "y"
 "z"

julia> pdf(d, "y")
0.0

Specifying a pool

Alternatively, support may be a list of raw (non-categorical) elements if pool is:

  • some CategoricalArray, CategoricalValue or CategoricalPool, such that support is a subset of levels(pool)

  • missing, in which case a new categorical pool is created which has support as its only levels.

In the last case, specify ordered=true if the pool is to be considered ordered.

julia> UnivariateFinite(["x", "z"], [0.1, 0.9], pool=missing, ordered=true)
UnivariateFinite{OrderedFactor{2}}(x=>0.1, z=>0.9)

julia> d = UnivariateFinite(["x", "z"], [0.1, 0.9], pool=v) # v defined above
UnivariateFinite{Multiclass{3}}(x=>0.1, z=>0.9)

julia> pdf(d, "y") # allowed as `"y" in levels(v)`
0.0

julia> v = categorical(["x", "x", "y", "x", "z", "w"])
6-element CategoricalArrays.CategoricalArray{String,1,UInt32}:
 "x"
 "x"
 "y"
 "x"
 "z"
 "w"

julia> probs = rand(100, 3); probs = probs ./ sum(probs, dims=2);

julia> UnivariateFinite(["x", "y", "z"], probs, pool=v)
100-element UnivariateFiniteVector{Multiclass{4}, String, UInt32, Float64}:
 UnivariateFinite{Multiclass{4}}(x=>0.194, y=>0.3, z=>0.505)
 UnivariateFinite{Multiclass{4}}(x=>0.727, y=>0.234, z=>0.0391)
 UnivariateFinite{Multiclass{4}}(x=>0.674, y=>0.00535, z=>0.321)
 ⋮
 UnivariateFinite{Multiclass{4}}(x=>0.292, y=>0.339, z=>0.369)

Probability augmentation

If augment=true the provided array is augmented by inserting appropriate elements ahead of those provided, along the last dimension of the array. This means the user only provides probabilities for the classes c2, c3, ..., cn. The class c1 probabilities are chosen so that each UnivariateFinite distribution in the returned array is a bona fide probability distribution.


UnivariateFinite(prob_given_class; pool=nothing, ordered=false)

Construct a discrete univariate distribution whose finite support is the set of keys of the provided dictionary, prob_given_class, and whose values specify the corresponding probabilities.

The type requirements on the keys of the dictionary are the same as the elements of support given above with this exception: if non-categorical elements (raw labels) are used as keys, then pool=... must be specified and cannot be missing.

If the values (probabilities) are arrays instead of scalars, then an abstract array of UnivariateFinite elements is created, with the same size as the array.

source
MLJModelInterface.classesMethod
classes(x)

All the categorical elements with the same pool as x (including x), returned as a list, with an ordering consistent with the pool. Here x has CategoricalValue type, and classes(x) is a vector of the same eltype. Note that x in classes(x) is always true.

Not to be confused with levels(x.pool). See the example below.

julia> v = categorical(["c", "b", "c", "a"])
4-element CategoricalArrays.CategoricalArray{String,1,UInt32}:
 "c"
 "b"
 "c"
 "a"

julia> levels(v)
3-element Vector{String}:
 "a"
 "b"
 "c"

julia> x = v[4]
CategoricalArrays.CategoricalValue{String, UInt32} "a"

julia> classes(x)
3-element CategoricalArrays.CategoricalArray{String,1,UInt32}:
 "a"
 "b"
 "c"

julia> levels(x.pool)
3-element Vector{String}:
 "a"
 "b"
 "c"
source
MLJModelInterface.decoderMethod
decoder(x)

Return a callable object for decoding the integer representation of a CategoricalValue sharing the same pool the CategoricalValue x. Specifically, one has decoder(x)(int(y)) == y for all CategoricalValues y having the same pool as x. One can also call decoder(x) on integer arrays, in which case decoder(x) is broadcast over all elements.

Examples

julia> v = categorical(["c", "b", "c", "a"])
4-element CategoricalArrays.CategoricalArray{String,1,UInt32}:
 "c"
 "b"
 "c"
 "a"

julia> int(v)
4-element Vector{UInt32}:
 0x00000003
 0x00000002
 0x00000003
 0x00000001

julia> d = decoder(v[3]);

julia> d(int(v)) == v
true

Warning:

It is not true that int(d(u)) == u always holds.

See also: int.

source
MLJModelInterface.fitFunction
MLJModelInterface.fit(model, verbosity, data...) -> fitresult, cache, report

All models must implement a fit method. Here data is the output of reformat on user-provided data, or some some resampling thereof. The fallback of reformat returns the user-provided data (eg, a table).

source
MLJModelInterface.fitted_paramsMethod
fitted_params(model, fitresult) -> human_readable_fitresult # named_tuple

Models may overload fitted_params. The fallback returns (fitresult=fitresult,).

Other training-related outcomes should be returned in the report part of the tuple returned by fit.

source
MLJModelInterface.intMethod
int(x)

The positional integer of the CategoricalString or CategoricalValue x, in the ordering defined by the pool of x. The type of int(x) is the reference type of x.

Not to be confused with x.ref, which is unchanged by reordering of the pool of x, but has the same type.

int(X::CategoricalArray)
int(W::Array{<:CategoricalString})
int(W::Array{<:CategoricalValue})

Broadcasted versions of int.

julia> v = categorical(["c", "b", "c", "a"])
4-element CategoricalArrays.CategoricalArray{String,1,UInt32}:
 "c"
 "b"
 "c"
 "a"

julia> levels(v)
3-element Vector{String}:
 "a"
 "b"
 "c"

julia> int(v)
4-element Vector{UInt32}:
 0x00000003
 0x00000002
 0x00000003
 0x00000001

See also: decoder.

source
MLJModelInterface.is_same_exceptMethod
is_same_except(m1, m2, exceptions::Symbol...; deep_properties=Symbol[])

If both m1 and m2 are of MLJType, return true if the following conditions all hold, and false otherwise:

  • typeof(m1) === typeof(m2)

  • propertynames(m1) === propertynames(m2)

  • with the exception of properties listed as exceptions or bound to an AbstractRNG, each pair of corresponding property values is either "equal" or both undefined. (If a property appears as a propertyname but not a fieldname, it is deemed as always defined.)

The meaining of "equal" depends on the type of the property value:

  • values that are themselves of MLJType are "equal" if they are equal in the sense of is_same_except with no exceptions.

  • values that are not of MLJType are "equal" if they are ==.

In the special case of a "deep" property, "equal" has a different meaning; see deep_properties) for details.

If m1 or m2 are not MLJType objects, then return ==(m1, m2).

source
MLJModelInterface.isrepresentedMethod
isrepresented(object::MLJType, objects)

Test if object has a representative in the iterable objects. This is a weaker requirement than object in objects.

Here we say m1 represents m2 if is_same_except(m1, m2) is true.

source
MLJModelInterface.matrixMethod
matrix(X; transpose=false)

If X isa AbstractMatrix, return X or permutedims(X) if transpose=true. Otherwise if X is a Tables.jl compatible table source, convert X into a Matrix.

source
MLJModelInterface.metadata_modelMethod
metadata_model(T; args...)

Helper function to write the metadata for a model T.

Keywords

  • input_scitype=Unknown: allowed scientific type of the input data
  • target_scitype=Unknown: allowed scitype of the target (supervised)
  • output_scitype=Unknown: allowed scitype of the transformed data (unsupervised)
  • supports_weights=false: whether the model supports sample weights
  • supports_class_weights=false: whether the model supports class weights
  • load_path="unknown": where the model is (usually PackageName.ModelName)
  • human_name=nothing: human name of the model
  • supports_training_losses=nothing: whether the (necessarily iterative) model can report training losses
  • reports_feature_importances=nothing: whether the model reports feature importances

Example

metadata_model(KNNRegressor,
    input_scitype=MLJModelInterface.Table(MLJModelInterface.Continuous),
    target_scitype=AbstractVector{MLJModelInterface.Continuous},
    supports_weights=true,
    load_path="NearestNeighbors.KNNRegressor")
source
MLJModelInterface.metadata_pkgMethod
metadata_pkg(T; args...)

Helper function to write the metadata for a package providing model T. Use it with broadcasting to define the metadata of the package providing a series of models.

Keywords

  • package_name="unknown" : package name
  • package_uuid="unknown" : package uuid
  • package_url="unknown" : package url
  • is_pure_julia=missing : whether the package is pure julia
  • package_license="unknown": package license
  • is_wrapper=false : whether the package is a wrapper

Example

metadata_pkg.((KNNRegressor, KNNClassifier),
    package_name="NearestNeighbors",
    package_uuid="b8a86587-4115-5ab1-83bc-aa920d37bbce",
    package_url="https://github.com/KristofferC/NearestNeighbors.jl",
    is_pure_julia=true,
    package_license="MIT",
    is_wrapper=false)
source
MLJModelInterface.paramsMethod
params(m::MLJType)

Recursively convert any transparent object m into a named tuple, keyed on the fields of m. An object is transparent if MLJModelInterface.istransparent(m) == true. The named tuple is possibly nested because params is recursively applied to the field values, which themselves might be transparent.

Most objects of type MLJType are transparent.

julia> params(EnsembleModel(model=ConstantClassifier()))
(model = (target_type = Bool,),
 weights = Float64[],
 bagging_fraction = 0.8,
 rng_seed = 0,
 n = 100,
 parallel = true,)
source
MLJModelInterface.predictFunction
predict(model, fitresult, new_data...)

Supervised and SupervisedAnnotator models must implement the predict operation. Here new_data is the output of reformat called on user-specified data.

source
MLJModelInterface.reformatMethod
MLJModelInterface.reformat(model, args...) -> data

Models optionally overload reformat to define transformations of user-supplied data into some model-specific representation (e.g., from a table to a matrix). When implemented, the MLJ user can avoid repeating such transformations unnecessarily, and can additionally make use of more efficient row subsampling, which is then based on the model-specific representation of data, rather than the user-representation. When reformat is overloaded, selectrows(::Model, ...) must be as well (see selectrows). Furthermore, the model fit method(s), and operations, such as predict and transform, must be refactored to act on the model-specific representations of the data.

To implement the reformat data front-end for a model, refer to "Implementing a data front-end" in the MLJ manual.

source
MLJModelInterface.scitypeMethod
scitype(X)

The scientific type (interpretation) of X, distinct from its machine type.

Examples

julia> scitype(3.14)
Continuous

julia> scitype([1, 2, missing])
AbstractVector{Union{Missing, Count}} 

julia> scitype((5, "beige"))
Tuple{Count, Textual}

julia> using CategoricalArrays

julia> X = (gender = categorical(['M', 'M', 'F', 'M', 'F']),
            ndevices = [1, 3, 2, 3, 2]);

julia> scitype(X)
Table{Union{AbstractVector{Count}, AbstractVector{Multiclass{2}}}}
source
MLJModelInterface.selectFunction
select(X, r, c)

Select element(s) of a table or matrix at row(s) r and column(s) c. An object of the sink type of X (or a matrix) is returned unless c is a single integer or symbol. In that case a vector is returned, unless r is a single integer, in which case a single element is returned.

See also: selectrows, selectcols.

source
MLJModelInterface.selectcolsFunction
selectcols(X, c)

Select single or multiple columns from a matrix or table X. If c is an abstract vector of integers or symbols, then the object returned is a table of the preferred sink type of typeof(X). If c is a single integer or column, then an AbstractVector is returned.

source
MLJModelInterface.selectrowsFunction
selectrows(X, r)

Select single or multiple rows from a table, abstract vector or matrix X. If X is tabular, the object returned is a table of the preferred sink type of typeof(X), even if only a single row is selected.

If the object is neither a table, abstract vector or matrix, X is returned and r is ignored.

source
MLJModelInterface.selectrowsMethod
MLJModelInterface.selectrows(::Model, I, data...) -> sampled_data

A model overloads selectrows whenever it buys into the optional reformat front-end for data preprocessing. See reformat for details. The fallback assumes data is a tuple and calls selectrows(X, I) for each X in data, returning the results in a new tuple of the same length. This call makes sense when X is a table, abstract vector or abstract matrix. In the last two cases, a new object and not a view is returned.

source
MLJModelInterface.tableMethod
table(columntable; prototype=nothing)

Convert a named tuple of vectors or tuples columntable, into a table of the "preferred sink type" of prototype. This is often the type of prototype itself, when prototype is a sink; see the Tables.jl documentation. If prototype is not specified, then a named tuple of vectors is returned.

table(A::AbstractMatrix; names=nothing, prototype=nothing)

Wrap an abstract matrix A as a Tables.jl compatible table with the specified column names (a tuple of symbols). If names are not specified, names=(:x1, :x2, ..., :xn) is used, where n=size(A, 2).

If a prototype is specified, then the matrix is materialized as a table of the preferred sink type of prototype, rather than wrapped. Note that if prototype is not specified, then matrix(table(A)) is essentially a no-op.

source
MLJModelInterface.training_lossesMethod
MLJModelInterface.training_losses(model::M, report)

If M is an iterative model type which calculates training losses, implement this method to return an AbstractVector of the losses in historical order. If the model calculates scores instead, then the sign of the scores should be reversed.

The following trait overload is also required: MLJModelInterface.supports_training_losses(::Type{<:M}) = true.

source
MLJModelInterface.updateMethod
MLJModelInterface.update(model, verbosity, fitresult, cache, data...)

Models may optionally implement an update method. The fallback calls fit.

source
StatisticalTraits.deep_propertiesFunction
deep_properties(::Type{<:MLJType})

Given an MLJType subtype M, the value of this trait should be a tuple of any properties of M to be regarded as "deep".

When two instances of type M are to be tested for equality, in the sense of == or is_same_except, then the values of a "deep" property (whose values are assumed to be of composite type) are deemed to agree if all corresponding properties of those property values are ==.

Any property of M whose values are themselves of MLJType are "deep" automatically, and should not be included in the trait return value.

See also is_same_except

Example

Consider an MLJType subtype Foo, with a single field of type Bar which is not a subtype of MLJType:

mutable struct Bar
    x::Int
end

mutable struct Foo <: MLJType
    bar::Bar
end

Then the mutability of Foo implies Foo(1) != Foo(1) and so, by the definition == for MLJType objects (see is_same_except) we have

Bar(Foo(1)) != Bar(Foo(1))

However after the declaration

MLJModelInterface.deep_properties(::Type{<:Foo}) = (:bar,)

We have

Bar(Foo(1)) == Bar(Foo(1))
source
MLJModelInterface._model_cleanerMethod
_model_cleaner(modelname, defaults, constraints)

Build the expression of the cleaner associated with the constraints specified in a model def.

source
MLJModelInterface._model_constructorMethod
_model_constructor(modelname, params, defaults)

Build the expression of the keyword constructor associated with a model definition. When the constructor is called, the clean! function is called as well to check that parameter assignments are valid.

source
MLJModelInterface._process_model_defMethod
_process_model_def(modl, ex)

Take an expression defining a model (mutable struct Model ...) and unpack key elements for further processing:

  • Model name (modelname)
  • Names of parameters (params)
  • Default values (defaults)
  • Constraints (constraints)

When no default field value is given a heuristic is to guess an appropriate default (eg, zero for a Float64 parameter). To this end, the specified type expression is evaluated in the module modl.

source
MLJModelInterface._unpack!Method
_unpack!(ex, rep)

Internal function to allow to read a constraint given after a default value for a parameter and transform it in an executable condition (which is returned to be executed later). For instance if we have

alpha::Int = 0.5::(arg > 0.0)

Then it would transform the (arg > 0.0) in (alpha > 0.0) which is executable.

source
MLJModelInterface.doc_headerMethod
MLJModelInterface.doc_header(SomeModelType; augment=false)

Return a string suitable for interpolation in the document string of an MLJ model type. In the example given below, the header expands to something like this:

FooRegressor

A model type for constructing a foo regressor, based on FooRegressorPkg.jl.

From MLJ, the type can be imported using

FooRegressor = @load FooRegressor pkg=FooRegressorPkg

Construct an instance with default hyper-parameters using the syntax model = FooRegressor(). Provide keyword arguments to override hyper-parameter defaults, as in FooRegressor(a=...).

Ordinarily, doc_header is used in document strings defined after the model type definition, as doc_header assumes model traits (in particular, package_name and package_url) to be defined; see also MLJModelInterface.metadata_pkg.

Example

Suppose a model type and traits have been defined by:

mutable struct FooRegressor
    a::Int
    b::Float64
end

metadata_pkg(FooRegressor,
    name="FooRegressorPkg",
    uuid="10745b16-79ce-11e8-11f9-7d13ad32a3b2",
    url="http://existentialcomics.com/",
    )
metadata_model(FooRegressor,
    input=Table(Continuous),
    target=AbstractVector{Continuous})

Then the docstring is defined after these declarations with the following code:

"""
$(MLJModelInterface.doc_header(FooRegressor))

### Training data

In MLJ or MLJBase, bind an instance `model` ...

<rest of doc string goes here>

"""
FooRegressor

Variation to augment existing document string

For models that have a native API with separate documentation, one may want to call doc_header(FooRegressor, augment=true) instead. In that case, the output will look like this:

From MLJ, the FooRegressor type can be imported using

FooRegressor = @load FooRegressor pkg=FooRegressorPkg

Construct an instance with default hyper-parameters using the syntax model = FooRegressor(). Provide keyword arguments to override hyper-parameter defaults, as in FooRegressor(a=...).

source
MLJModelInterface.feature_importancesFunction
feature_importances(model::M, fitresult, report)

For a given model of model type M supporting intrinsic feature importances, calculate the feature importances from the model's fitresult and report as an abstract vector of feature::Symbol => importance::Real pairs (e.g [:gender =>0.23, :height =>0.7, :weight => 0.1]).

New model implementations

The following trait overload is also required: MLJModelInterface.reports_feature_importances(::Type{<:M}) = true

If for some reason a model is sometimes unable to report feature importances then feature_importances should return all importances as 0.0, as in [:gender =>0.0, :height =>0.0, :weight => 0.0].

source
MLJModelInterface.flat_paramsMethod
flat_params(m::Model)

Deconstruct any Model instance model as a flat named tuple, keyed on property names. Properties of nested model instances are recursively exposed,.as shown in the example below. For most Model objects, properties are synonymous with fields, but this is not a hard requirement.

julia> using MLJModels
julia> using EnsembleModels
julia> tree = (@load DecisionTreeClassifier pkg=DecisionTree)();

julia> flat_params(EnsembleModel(model=tree))
(model__max_depth = -1,
 model__min_samples_leaf = 1,
 model__min_samples_split = 2,
 model__min_purity_increase = 0.0,
 model__n_subfeatures = 0,
 model__post_prune = false,
 model__merge_purity_threshold = 1.0,
 model__display_depth = 5,
 model__feature_importance = :impurity,
 model__rng = Random._GLOBAL_RNG(),
 atomic_weights = Float64[],
 bagging_fraction = 0.8,
 rng = Random._GLOBAL_RNG(),
 n = 100,
 acceleration = CPU1{Nothing}(nothing),
 out_of_bag_measure = Any[],)
source
MLJModelInterface.reportMethod
MLJModelInterface.report(model, report_given_method)

Merge the reports in the dictionary report_given_method into a single property-accessible object. It is supposed that each key of the dictionary is either :fit or the name of an operation, such as :predict or :transform. Each value will be the report component returned by a training method (fit or update) dispatched on the model type, in the case of :fit, or the report component returned by an operation that supports reporting.

New model implementations

Overloading this method is optional, unless the model generates reports that are neither named tuples nor nothing.

Assuming each value in the report_given_method dictionary is either a named tuple or nothing, and there are no conflicts between the keys of the dictionary values (the individual reports), the fallback returns the usual named tuple merge of the dictionary values, ignoring any nothing value. If there is a key conflict, all operation reports are first wrapped in a named tuple of length one, as in (predict=predict_report,). A :fit report is never wrapped.

If any dictionary value is neither a named tuple nor nothing, it is first wrapped as (report=value, ) before merging.

source
MLJModelInterface.schemaMethod
schema(X)

Inspect the column types and scitypes of a tabular object. returns nothing if the column types and scitypes can't be inspected.

source