API

API

Functions

MLJ.EnsembleModelMethod.
EnsembleModel(atom=nothing, 
              weights=Float64[],
              bagging_fraction=0.8,
              rng=GLOBAL_RNG, n=100,
              parallel=true,
              out_of_bag_measure=[])

Create a model for training an ensemble of n learners, with optional bagging, each with associated model atom. Ensembling is useful if fit!(machine(atom, data...)) does not create identical models on repeated calls (ie, is a stochastic model, such as a decision tree with randomized node selection criteria), or if bagging_fraction is set to a value less than 1.0, or both. The constructor fails if no atom is specified.

If rng is an integer, then MersenneTwister(rng) is the random number generator used for bagging. Otherwise some AbstractRNG object is expected.

Predictions are weighted according to the vector weights (to allow for external optimization) except in the case that atom is a Deterministic classifier. Uniform weights are used if weight has zero length.

The ensemble model is Deterministic or Probabilistic, according to the corresponding supertype of atom. In the case of deterministic classifiers (target_scitype_union(atom) <: Finite), the predictions are majority votes, and for regressors (target_scitype_union(atom)<: Continuous) they are ordinary averages. Probabilistic predictions are obtained by averaging the atomic probability distribution/mass functions; in particular, for regressors, the ensemble prediction on each input pattern has the type MixtureModel{VF,VS,D} from the Distributions.jl package, where D is the type of predicted distribution for atom.

If a single measure or non-empty vector of measusres is specified by out_of_bag_measure, then out-of-bag estimates of performance are reported.

source
MLJ.TunedModelMethod.
tuned_model = TunedModel(; model=nothing,
                         tuning=Grid(),
                         resampling=Holdout(),
                         measure=nothing,
                         operation=predict,
                         nested_ranges=NamedTuple(),
                         minimize=true,
                         full_report=true)

Construct a model wrapper for hyperparameter optimization of a supervised learner.

Calling fit!(mach) on a machine mach=machine(tuned_model, X, y) or mach=machine(tuned_model, task) will: (i) Instigate a search, over clones of model with the hyperparameter mutations specified by nested_ranges, for that model optimizing the specified measure, according to evaluations carried out using the specified tuning strategy and resampling strategy; and (ii) Fit a machine, mach_optimal = fitted_params(mach).best_model, wrapping the optimal model object in all the provided data X, y (or in task). Calling predict(mach, Xnew) then returns predictions on Xnew of the machine mach_optimal.

If measure is a score, rather than a loss, specify minimize=false.

In the case of two-parameter tuning, a Plots.jl plot of performance estimates is returned by plot(mach) or heatmap(mach).

source
MLJ.coerceMethod.
coerce(d::Dict, X)

Return a copy of the table X with columns named in the keys of d coerced to have scitype_union equal to the corresponding value.

source
MLJ.coerceMethod.
coerce(T, v::AbstractVector)

Coerce the machine types of elements of v to ensure the returned vector has T as its scitype_union, or Union{Missing,T}, if v has missing values.

julia> v = coerce(Continuous, [1, missing, 5])
3-element Array{Union{Missing, Float64},1}:
 1.0     
 missing
 5.0  

julia> scitype_union(v)
Union{Missing,Continuous}

See also scitype, scitype_union, scitypes

source
MLJ.evaluate!Method.
evaluate!(mach, resampling=CV(), measure=nothing, operation=predict, force=false, verbosity=1)

Estimate the performance of a machine mach using the specified resampling strategy (defaulting to 6-fold cross-validation) and measure, which can be a single measure or vector.

Although evaluate! is mutating, mach.model and mach.args are preserved.

Resampling and testing is based exclusively on data in rows, when specified.

If no measure is specified, then default_measure(mach.model) is used, unless this default is nothing and an error is thrown.

source
MLJ.iteratorMethod.
iterator(model::Model, param_iterators::NamedTuple)

Iterator over all models of type typeof(model) defined by param_iterators.

Each name in the nested :name => value pairs of param_iterators should be the name of a (possibly nested) field of model; and each element of flat_values(param_iterators) (the corresponding final values) is an iterator over values of one of those fields.

See also iterator and params.

source
curve = learning_curve!(mach; resolution=30, resampling=Holdout(), measure=rms, operation=predict, nested_range=nothing, n=1)

Given a supervised machine mach, returns a named tuple of objects needed to generate a plot of performance measurements, as a function of the single hyperparameter specified in nested_range. The tuple curve has the following keys: :parameter_name, :parameter_scale, :parameter_values, :measurements.

For n not equal to 1, multiple curves are computed, and the value of curve.measurements is an array, one column for each run. This is useful in the case of models with indeterminate fit-results, such as a random forest.

X, y = datanow()
atom = RidgeRegressor()
ensemble = EnsembleModel(atom=atom)
mach = machine(ensemble, X, y)
r_lambda = range(atom, :lambda, lower=0.1, upper=100, scale=:log10)
curve = MLJ.learning_curve!(mach; nested_range=(atom=(lambda=r_lambda,),))
using Plots
plot(curve.parameter_values, curve.measurements, xlab=curve.parameter_name, xscale=curve.parameter_scale)

If the specified hyperparameter is the number of iterations in some iterative model (and that model has an appropriately overloaded MLJBase.update method) then training is not restarted from scratch for each increment of the parameter, ie the model is trained progressively.

atom.lambda=1.0
r_n = range(ensemble, :n, lower=2, upper=150)
curves = MLJ.learning_curve!(mach; nested_range=(n=r_n,), verbosity=3, n=5)
plot(curves.parameter_values, curves.measurements, xlab=curves.parameter_name)
source
MLJ.machinesMethod.
machines(N)

List all machines in the learning network terminating at node N.

source
MLJ.modelsMethod.
models(N::AbstractNode)

A vector of all models referenced by node N, each model appearing exactly once.

source
MLJ.modelsMethod.
models()

List all model as a dictionary indexed on package name`. Models available for immediate use appear under the key "MLJ".

models(conditional)

Restrict results to package model pairs (m, p) satisfying conditional(info(m, pkg=p)) == true.

models(task::MLJTask)

List all models matching the specified task.

Example

To retrieve all proababilistic classifiers:

models(x -> x[:is_supervised] && x[:is_probabilistic]==true)

See also: localmodels

source
MLJ.originsMethod.
origins(N)

Return a list of all origins of a node N accessed by a call N(). These are the source nodes of the acyclic directed graph associated learning network terminating at N of the, if edges corresponding to training arguments are excluded. A Node object cannot be called on new data unles it has a unique origin.

Not to be confused with sources(N) which refers to the same graph but without the training edge deletions.

See also: node, source

source
MLJ.rmspMethod.

Root mean squared percentage loss

source
MLJ.set_params!Method.

setparams!(model, nestedparams)

Mutate the possibly nested fields of model, as returned by params(model), by specifying a named tuple nested_params matching the pattern of params(model).

julia> rf = EnsembleModel(atom=DecisionTreeClassifier());
julia> params(rf)
(atom = (pruning_purity = 1.0,
         max_depth = -1,
         min_samples_leaf = 1,
         min_samples_split = 2,
         min_purity_increase = 0.0,
         n_subfeatures = 0.0,
         display_depth = 5,
         post_prune = false,
         merge_purity_threshold = 0.9,),
 weights = Float64[],
 bagging_fraction = 0.8,
 n = 100,
 parallel = true,
 out_of_bag_measure = Any[],)

julia> set_params!(rf, (atom = (max_depth = 2,), n = 200));
julia> params(rf)
(atom = (pruning_purity = 1.0,
         max_depth = 2,
         min_samples_leaf = 1,
         min_samples_split = 2,
         min_purity_increase = 0.0,
         n_subfeatures = 0.0,
         display_depth = 5,
         post_prune = false,
         merge_purity_threshold = 0.9,),
 weights = Float64[],
 bagging_fraction = 0.8,
 n = 200,
 parallel = true,
 out_of_bag_measure = Any[],)
source
MLJ.sourcesMethod.

sources(N::AbstractNode)

A vector of all sources referenced by calls N() and fit!(N). These are the sources of the directed acyclic graph associated with the learning network terminating at N.

Not to be confused with origins(N) which refers to the same graph with edges corresponding to training arguments deleted.

See also: orgins, source

source
MLJ.supervisedMethod.
task = supervised(data=nothing, 
                  types=Dict(), 
                  target=nothing,  
                  ignore=Symbol[], 
                  is_probabilistic=false, 
                  verbosity=1)

Construct a supervised learning task with input features X and target y, where: y is the column vector from data named target, if this is a single symbol, or a vector of tuples, if target is a vector; X consists of all remaining columns of data not named in ignore, and is a table unless it has only one column, in which case it is a vector.

The data types of elements in a column of data named as a key of the dictionary types are coerced to have a scientific type given by the corresponding value. Possible values are Continuous, Multiclass, OrderedFactor and Count. So, for example, types=Dict(:x1=>Count) means elements of the column of data named :x1 will be coerced into integers (whose scitypes are always Count).

task = supervised(X, y; 
                  input_is_multivariate=true, 
                  is_probabilistic=false, 
                  verbosity=1)

A more customizable constructor, this returns a supervised learning task with input features X and target y, where: X must be a table or vector, according to whether it is multivariate or univariate, while y must be a vector whose elements are scalars, or tuples scalars (of constant length for ordinary multivariate predictions, and of variable length for sequence prediction). Table rows must correspond to patterns and columns to features. Type coercion is not available for this constructor (but see also coerce).

X, y = task()

Returns the input X and target y of the task, also available as task.X and task.y.

source
MLJ.unsupervisedMethod.
task = unsupervised(data=nothing, types=Dict(), ignore=Symbol[], verbosity=1)

Construct an unsupervised learning task with given input data, which should be a table or, in the case of univariate inputs, a single vector.

The data types of elements in a column of data named as a key of the dictionary types are coerced to have a scientific type given by the corresponding value. Possible values are Continuous, Multiclass, OrderedFactor and Count. So, for example, types=Dict(:x1=>Count) means elements of the column of data named :x1 will be coerced into integers (whose scitypes are always Count).

Rows of data must correspond to patterns and columns to features. Columns in data whose names appear in ignore are ignored.

X = task()

Return the input data in form to be used in models.

See also scitype, scitype_union, scitypes.

source
MLJBase.infoMethod.

info(model, pkg=nothing)

Return the dictionary of metadata associated with model::String. If more than one package implements model then pkg::String will need to be specified.

source
MLJ.@curveMacro.

@curve

The code,

@curve var range code

evaluates code, replacing appearances of var therein with each value in range. The range and corresponding evaluations are returned as a tuple of arrays. For example,

@curve  x 1:3 (x^2 + 1)

evaluates to

([1,2,3], [2, 5, 10])

This is convenient for plotting functions using, eg, the Plots package:

plot(@curve x 1:3 (x^2 + 1))

A macro @pcurve parallelizes the same behaviour. A two-variable implementation is also available, operating as in the following example:

julia> @curve x [1,2,3] y [7,8] (x + y)
([1,2,3],[7 8],[8.0 9.0; 9.0 10.0; 10.0 11.0])

julia> ans[3]
3×2 Array{Float64,2}:
  8.0   9.0
  9.0  10.0
 10.0  11.0

N.B. The second range is returned as a row vector for consistency with the output matrix. This is also helpful when plotting, as in:

julia> u1, u2, A = @curve x range(0, stop=1, length=100) α [1,2,3] x^α
julia> u2 = map(u2) do α "α = "*string(α) end
julia> plot(u1, A, label=u2)

which generates three superimposed plots - of the functions x, x^2 and x^3 - each labels with the exponents α = 1, 2, 3 in the legend.

source

@fromnetwork NewCompositeModel(fld1=model1, fld2=model2, ...) <= (Xs, N) @fromnetwork NewCompositeModel(fld1=model1, fld2=model2, ...) <= (Xs, ys, N)

Create, respectively, a new stand-alone unsupervised or superivsed model type NewCompositeModel using a learning network as a blueprint. Here Xs, ys and N refer to the input source, node, target source node and terminating source node of the network. The model type NewCompositeModel is equipped with fields named :fld1, :fld2, ..., which correspond to component models model1, model2 appearing in the network (which must therefore be elements of models(N)). Deep copies of the specified component models are used as default values in an automatically generated keyword constructor for NewCompositeModel.

Return value: A new NewCompositeModel instance, with default field values.

For details and examples refer to the "Learning Networks" section of the documentation.

source
SimpleDeterministicCompositeModel(;regressor=ConstantRegressor(), 
                          transformer=FeatureSelector())

Construct a composite model consisting of a transformer (Unsupervised model) followed by a Deterministic model. Mainly intended for internal testing .

source
Base.copyFunction.
copy(params::NamedTuple, values=nothing)

Return a copy of params with new values. That is, flat_values(copy(params, values)) == values is true, while the nested keys remain unchanged.

If values is not specified a deep copy is returned.

source
Base.rangeMethod.
r = range(model, :hyper; values=nothing)

Defines a `NominalRange` object for a field `hyper` of `model`,

assuming the field is a not a subtype of Real. Note that r is not directly iterable but iterator(r) iterates over values.

r = range(model, :hyper; upper=nothing, lower=nothing, scale=:linear)

Defines a NumericRange object for a Real field hyper of model. Note that r is not directly iteratable but iterator(r, n) iterates over n values between lower and upper values, according to the specified scale. The supported scales are :linear, :log, :log10, :log2. Values for Integer types are rounded (with duplicate values removed, resulting in possibly less than n values).

Alternatively, if a function f is provided as scale, then iterator(r, n) iterates over the values [f(x1), f(x2), ... , f(xn)], where x1, x2, ..., xn are linearly spaced between lower and upper.

See also: iterator

source
Base.replaceMethod.
replace(W::MLJ.Node, a1=>b1, a2=>b2, ....)

Create a deep copy of a node W, and thereby replicate the learning network terminating at W, but replacing any specified sources and models a1, a2, ... of the original network with the specified targets b1, b2, ....

source
MLJ.flat_keysMethod.
 flat_keys(params::NamedTuple)

Use dot-concatentation to express each possibly nested key of params in string form.

Example

julia> flat_keys((A=(x=2, y=3), B=9)))
["A.x", "A.y", "B"]
source
MLJ.get_typeMethod.

get_type(T, field::Symbol)

Returns the type of the field field of DataType T. Not a type-stable function.

source
MLJ.rebind!Method.

rebind(s::Source, X)

Attach new data X to an existing source node s.

source
MLJ.reset!Method.
reset!(N::Node)

Place the learning network terminating at node N into a state in which fit!(N) will retrain from scratch all machines in its dependency tape. Does not actually train any machine or alter fit-results. (The method simply resets m.state to zero, for every machine m in the network.)

source
MLJ.scaleMethod.
MLJ.scale(r::ParamRange)

Return the scale associated with the ParamRange object r. The possible return values are: :none (for a NominalRange), :linear, :log, :log10, :log2, or :custom (if r.scale is function).

source
MLJ.treeMethod.
MLJ.tree(N::Node)

Return a description of the tree defined by the learning network terminating at node N.

source
MLJ.unwindMethod.
unwind(iterators...)

Represent all possible combinations of values generated by iterators as rows of a matrix A. In more detail, A has one column for each iterator in iterators and one row for each distinct possible combination of values taken on by the iterators. Elements in the first column cycle fastest, those in the last clolumn slowest.

Example

julia> iterators = ([1, 2], ["a","b"], ["x", "y", "z"]);
julia> MLJ.unwind(iterators...)
12×3 Array{Any,2}:
 1  "a"  "x"
 2  "a"  "x"
 1  "b"  "x"
 2  "b"  "x"
 1  "a"  "y"
 2  "a"  "y"
 1  "b"  "y"
 2  "b"  "y"
 1  "a"  "z"
 2  "a"  "z"
 1  "b"  "z"
 2  "b"  "z"
source
MLJBase.classesMethod.
classes(x)

All the categorical values in the same pool as x (including x), returned as a list, with an ordering consistent with the pool. Here x has CategoricalValue or CategoricalString type, and classes(x) is a vector of the same eltype.

Not to be confused with the levels of x.pool which have a different type. In particular, while x in classes(x) is always true, x in x.pool.levels is not true.

julia> v = categorical([:c, :b, :c, :a])
julia> levels(v)
3-element Array{Symbol,1}:
 :a
 :b
 :c
julia> classes(v[4])
3-element Array{CategoricalValue{Symbol,UInt32},1}:
 :a
 :b
 :c
MLJBase.color_offMethod.
color_off()

Suppress color and bold output at the REPL for displaying MLJ objects.

MLJBase.color_onMethod.
color_on()

Enable color and bold output at the REPL, for enhanced display of MLJ objects.

container_type(X)

Return :table, :sparse, or :other, according to whether X is a supported table format, a supported sparse table format, or something else.

The first two formats, together abstract vectors, support the MLJBase accessor methods selectrows, selectcols, select, nrows, schema, and union_scitypes.

MLJBase.datanowMethod.

Get some supervised data now!!

MLJBase.decoderMethod.
d = decoder(x)

A callable object for decoding the integer representation of a CategoricalString or CategoricalValue sharing the same pool as x. (Here x is of one of these two types.) Specifically, one has d(int(y)) == y for all y in classes(x). One can also call d on integer arrays, in which case d is broadcast over all elements.

julia> v = categorical([:c, :b, :c, :a])
julia> int(v)
4-element Array{UInt32,1}:
 0x00000003
 0x00000002
 0x00000003
 0x00000001
julia> d = decoder(v[3])
julia> d(int(v)) == v
true

See also: int, classes

MLJBase.intMethod.

int(x)

The positional integer of the CategoricalString or CategoricalValue x, in the ordering defined by the pool of x. The type of int(x) is the refrence type of x.

Not to be confused with x.ref, which is unchanged by reordering of the pool of x, but has the same type.

int(X::CategoricalArray)
int(W::Array{<:CategoricalString})
int(W::Array{<:CategoricalValue})

Broadcasted versions of int.

julia> v = categorical([:c, :b, :c, :a])
julia> levels(v)
3-element Array{Symbol,1}:
 :a
 :b
 :c
julia> int(v)
4-element Array{UInt32,1}:
 0x00000003
 0x00000002
 0x00000003
 0x00000001

See also: decoder

MLJBase.load_amesMethod.

Load the full version of the well-known Ames Housing task.

Load a well-known public regression dataset with nominal features.

MLJBase.load_crabsMethod.

Load a well-known crab classification dataset with nominal features.

MLJBase.load_irisMethod.

Load a well-known public classification task with nominal features.

Load a reduced version of the well-known Ames Housing task, having six numerical and six categorical features.

MLJBase.matrixMethod.
MLJBase.matrix(X)

Convert a table source X into an Matrix; or, if X is a AbstractMatrix, return X. Optimized for column-based sources.

If instead X is a sparse table, then a SparseMatrixCSC object is returned. The integer relabelling of column names follows the lexicographic ordering (as indicated by schema(X).names).

MLJBase.nrowsMethod.
nrows(X)

Return the number of rows in a table, sparse table, or abstract vector.

MLJBase.paramsMethod.
params(m)

Recursively convert any transparent object m into a named tuple, keyed on the fields of m. A model is transparent if MLJBase.istransparent(m) == true. The named tuple is possibly nested because params is recursively applied to the field values, which themselves might be transparent.

Most objects of type MLJType are transparent.

julia> params(EnsembleModel(atom=ConstantClassifier()))
(atom = (target_type = Bool,),
 weights = Float64[],
 bagging_fraction = 0.8,
 rng_seed = 0,
 n = 100,
 parallel = true,)
MLJBase.partitionMethod.
partition(rows::AbstractVector{Int}, fractions...; shuffle=false)

Splits the vector rows into a tuple of vectors whose lengths are given by the corresponding fractions of length(rows). The last fraction is not provided, as it is inferred from the preceding ones. So, for example,

julia> partition(1:1000, 0.2, 0.7)
(1:200, 201:900, 901:1000)
MLJBase.schemaMethod.
schema(X)

Returns a struct with properties names, types with the obvious meanings. Here X is any table or sparse table.

MLJBase.scitypeMethod.
scitype(x)

Return the scientific type that an object x can represent, when appearing as an element of a table or vector used as input or target in fitting MLJ models.

julia> scitype(4.5)
Continous

julia> scitype("book")
Unknown

julia> using CategoricalArrays
julia> v = categorical([:m, :f, :f])
julia> scitype(v[1])
Multiclass{2}

Note that scitype "commutes" with the formation of tuples or arrays, as these examples illustrate:

scitype((42, float(π), "Julia"))
Tuple{Count,Continuous,Unknown}
scitype(rand(7,3))
AbstractArray{Continuous,2}

For getting the union of the scitypes of all elements of an iterable, use scitype_union.

scitype_union(A)

Return the type union, over all elements x generated by the iterable A, of scitype(x).

MLJBase.scitypesMethod.
scitypes(X)

Returns a named tuple keyed on the column names of the table X with values the corresponding scitype unions over a column's entries.

MLJBase.selectMethod.
select(X, r, c)

Select element of a table or sparse table at row r and column c. In the case of sparse data where the key (r, c), zero or missing is returned, depending on the value type.

See also: selectrows, selectcols

MLJBase.selectcolsMethod.
selectcols(X, c)

Select single or multiple columns from any table or sparse table X. If c is an abstract vector of integers or symbols, then the object returned is a table of the preferred sink type of typeof(X). If c is a single integer or column, then a Vector or CategoricalVector is returned.

MLJBase.selectrowsMethod.
selectrows(X, r)

Select single or multiple rows from any table, sparse table, or abstract vector X. If X is tabular, the object returned is a table of the preferred sink type of typeof(X), even a single row is selected.

MLJBase.tableMethod.
MLJBase.table(cols; prototype=cols)

Convert a named tuple of vectors cols, into a table. The table type returned is the "preferred sink type" for prototype (see the Tables.jl documentation).

MLJBase.table(X::AbstractMatrix; names=nothing, prototype=nothing)

Convert an abstract matrix X into a table with names (a tuple of symbols) as column names, or with labels (:x1, :x2, ..., :xn) where n=size(X, 2), if names is not specified. If prototype=nothing, then a named tuple of vectors is returned.

Equivalent to table(cols, prototype=prototype) where cols is the named tuple of columns of X, with keys(cols) = names.

@constant x = value

Equivalent to const x = value but registers the binding thus:

MLJBase.HANDLE_GIVEN_ID[objectid(value)] = :x

Registered objects get displayed using the variable name to which it was bound in calls to show(x), etc.

WARNING: As with any const declaration, binding x to new value of the same type is not prevented and the registration will not be updated.

MLJBase.@moreMacro.
@more

Entered at the REPL, equivalent to show(ans, 100). Use to get a recursive description of all fields of the last REPL value.

@set_defaults ModelType(args...)
@set_defaults ModelType args

Create a keyword constructor for any type ModelType<::MLJBase.Model, using as default values those listed in args. These must include a value for every field, and in the order appearing in fieldnames(ModelType).

The constructor created calls MLJBase.clean!(model) on the instantiated object model and calls @warn messsage if messsage = MLJBase.clean!(model) is non-empty. Note that MLJBase.clean! has a trivial fallback defined for all subtypes of MLJBase.Model.

Example

mutable struct Foo x::Int y end

@set_defaults Foo(1,2)

julia> Foo() Foo(1, 2)

julia> Foo(x=1, y="house") Foo(1, "house")

@set_defaults Foo [4, 5]

julia> Foo() Foo(4, 5)

StratifiedKFold(strata,k)

Struct for StratifiedKFold provide strata's and number of partitions(k) and simply collect the object for the indices. Taken from MLBase (https://github.com/JuliaStats/MLBase.jl).

_cummulative(d::UnivariateFinite)

Return the cummulative probability vector [0, ..., 1] for the distribution d, using whatever ordering is used in the dictionary d.prob_given_level. Used only for to implement random sampling from d.

MLJBase._randMethod.

rand(pcummulative)

Randomly sample the distribution with discrete support 1:n which has cummulative probability vector p_cummulative=[0, ..., 1] (of length n+1). Does not check the first and last elements of p_cummulative but does not use them either.

_recursive_show(stream, object, current_depth, depth)

Generate a table of the field values of the MLJType object, dislaying each value by calling the method _show on it. The behaviour of _show(stream, f) is as follows:

  1. If f is itself a MLJType object, then its short form is shown

and _recursive_show generates as separate table for each of its field values (and so on, up to a depth of argument depth).

  1. Otherwise f is displayed as "(omitted T)" where T = typeof(f),

unless istoobig(f) is false (the istoobig fall-back for arbitrary types being true). In the latter case, the long (ie, MIME"plain/text") form of f is shown. To override this behaviour, overload the _show method for the type in question.

to display abbreviated versions of integers

MLJBase.handleMethod.

return abbreviated object id (as string) or it's registered handle (as string) if this exists

Index