API

API

Functions

MLJ.EnsembleModelMethod.
EnsembleModel(atom=nothing, weights=Float64[], bagging_fraction=0.8, rng=GLOBAL_RNG, n=100, parallel=true, out_of_bag_measure=[])

Create a model for training an ensemble of n learners, with optional bagging, each with associated model atom. Ensembling is useful if fit!(machine(atom, data...)) does not create identical models on repeated calls (ie, is a stochastic model, such as a decision tree with randomized node selection criteria), or if bagging_fraction is set to a value less than 1.0, or both. The constructor fails if no atom is specified.

If rng is an integer, then MersenneTwister(rng) is the random number generator used for bagging. Otherwise some AbstractRNG object is expected.

Predictions are weighted according to the vector weights (to allow for external optimization) except in the case that atom is a Deterministic classifier. Uniform weights are used if weight has zero length.

The ensemble model is Deterministic or Probabilistic, according to the corresponding supertype of atom. In the case of deterministic classifiers (target_scitype_union(atom) <: Union{Multiclass,FiniteOrderedFactor}), the predictions are majority votes, and for regressors (target_scitype_union(atom)<: Continuous) they are ordinary averages. Probabilistic predictions are obtained by averaging the atomic probability distribution/mass functions; in particular, for regressors, the ensemble prediction on each input pattern has the type MixtureModel{VF,VS,D} from the Distributions.jl package, where D is the type of predicted distribution for atom.

If a single measure or non-empty vector of measusres is specified by out_of_bag_measure, then out of bag estimates of performance are reported.

source
MLJ.TunedModelMethod.
tuned_model = TunedModel(; model=nothing,
                         tuning=Grid(),
                         resampling=Holdout(),
                         measure=nothing,
                         operation=predict,
                         nested_ranges=NamedTuple(),
                         minimize=true,
                         full_report=true)

Construct a model wrapper for hyperparameter optimization of a supervised learner.

Calling fit!(mach) on a machine mach=machine(tuned_model, X, y) will: (i) Instigate a search, over clones of model with the hyperparameter mutations specified by nested_ranges, for that model optimizing the specified measure, according to evaluations carried out using the specified tuning strategy and resampling strategy; and (ii) Fit a machine, mach_optimal = mach.fitresult, wrapping the optimal model object in all the provided data X, y. Calling predict(mach, Xnew) then returns predictions on Xnew of the machine mach_optimal.

If measure is a score, rather than a loss, specify minimize=false.

The optimal clone of model is accessible as fitted_params(mach).best_model. In the case of two-parameter tuning, a Plots.jl plot of performance estimates is returned by plot(mach) or heatmap(mach).

source
MLJ.evaluate!Method.
evaluate!(mach, resampling=CV(), measure=nothing, operation=predict, verbosity=1)

Estimate the performance of a machine mach using the specified resampling strategy (defaulting to 6-fold cross-validation) and measure, which can be a single measure or vector.

Although evaluate! is mutating, mach.model and mach.args are preserved.

Resampling and testing is based exclusively on data in rows, when specified.

If no measure is specified, then default_measure(mach.model) is used, unless this default is nothing and an error is thrown.

source
MLJ.iteratorMethod.
iterator(model::Model, param_iterators::NamedTuple)

Iterator over all models of type typeof(model) defined by param_iterators.

Each name in the nested :name => value pairs of param_iterators should be the name of a (possibly nested) field of model; and each element of flat_values(param_iterators) (the corresponding final values) is an iterator over values of one of those fields.

See also iterator and params.

source
curve = learning_curve!(mach; resolution=30, resampling=Holdout(), measure=rms, operation=predict, nested_range=nothing, n=1)

Given a supervised machine mach, returns a named tuple of objects needed to generate a plot of performance measurements, as a function of the single hyperparameter specified in nested_range. The tuple curve has the following keys: :parameter_name, :parameter_scale, :parameter_values, :measurements.

For n not equal to 1, multiple curves are computed, and the value of curve.measurements is an array, one column for each run. This is useful in the case of models with indeterminate fit-results, such as a random forest.

X, y = datanow()
atom = RidgeRegressor()
ensemble = EnsembleModel(atom=atom)
mach = machine(ensemble, X, y)
r_lambda = range(atom, :lambda, lower=0.1, upper=100, scale=:log10)
curve = MLJ.learning_curve!(mach; nested_range=(atom=(lambda=r_lambda,),))
using Plots
plot(curve.parameter_values, curve.measurements, xlab=curve.parameter_name, xscale=curve.parameter_scale)

Smart fitting applies. For example, if the model is an ensemble model, and the hyperparemeter parameter is n, then atomic models are progressively added to the ensemble, not recomputed from scratch for each new value of n.

atom.lambda=1.0
r_n = range(ensemble, :n, lower=2, upper=150)
curves = MLJ.learning_curve!(mach; nested_range=(n=r_n,), verbosity=3, n=5)
plot(curves.parameter_values, curves.measurements, xlab=curves.parameter_name)
source
MLJ.rmspMethod.

Root mean squared percentage loss

source
MLJ.sourcesMethod.
sources(N)

Return a list of all ultimate sources of a node N.

See also: node, source

source
MLJBase.infoMethod.

info(model, pkg=nothing)

Return the dictionary of metadata associated with model::String. If more than one package implements model then pkg::String will need to be specified.

source
StatsBase.fit!Method.
fit!(mach::Machine; rows=nothing, verbosity=1)

Train the machine mach using the algorithm and hyperparameters specified by mach.model, using those rows of the wrapped data having indices in rows.

fit!(mach::NodalMachine; rows=nothing, verbosity=1)

A nodal machine is trained in the same way as a regular machine with one difference: Instead of training the model on the wrapped data indexed on rows, it is trained on the wrapped nodes called on rows, with calling being a recursive operation on nodes within a learning network.

source
MLJ.@curveMacro.

@curve

The code,

@curve var range code

evaluates code, replacing appearances of var therein with each value in range. The range and corresponding evaluations are returned as a tuple of arrays. For example,

@curve  x 1:3 (x^2 + 1)

evaluates to

([1,2,3], [2, 5, 10])

This is convenient for plotting functions using, eg, the Plots package:

plot(@curve x 1:3 (x^2 + 1))

A macro @pcurve parallelizes the same behaviour. A two-variable implementation is also available, operating as in the following example:

julia> @curve x [1,2,3] y [7,8] (x + y)
([1,2,3],[7 8],[8.0 9.0; 9.0 10.0; 10.0 11.0])

julia> ans[3]
3×2 Array{Float64,2}:
  8.0   9.0
  9.0  10.0
 10.0  11.0

N.B. The second range is returned as a row vector for consistency with the output matrix. This is also helpful when plotting, as in:

julia> u1, u2, A = @curve x range(0, stop=1, length=100) α [1,2,3] x^α
julia> u2 = map(u2) do α "α = "*string(α) end
julia> plot(u1, A, label=u2)

which generates three superimposed plots - of the functions x, x^2 and x^3 - each labels with the exponents α = 1, 2, 3 in the legend.

source
SimpleDeterministicCompositeModel(;regressor=ConstantRegressor(), 
                          transformer=FeatureSelector())

Construct a composite model consisting of a transformer (Unsupervised model) followed by a Deterministic model. Mainly intended for internal testing .

source
Base.copyFunction.
copy(params::NamedTuple, values=nothing)

Return a copy of params with new values. That is, flat_values(copy(params, values)) == values is true, while the nested keys remain unchanged.

If values is not specified a deep copy is returned.

source
Base.merge!Method.
merge!(tape1, tape2)

Incrementally appends to tape1 all elements in tape2, excluding any element previously added (or any element of tape1 in its initial state).

source
Base.rangeMethod.
r = range(model, :hyper; values=nothing)

Defines a NominalRange object for a field hyper of model. Note that r is not directly iterable but iterator(r) iterates over values.

r = range(model, :hyper; upper=nothing, lower=nothing, scale=:linear)

Defines a NumericRange object for a field hyper of model. Note that r is not directly iteratable but iterator(r, n) iterates over n values between lower and upper values, according to the specified scale. The supported scales are :linear, :log, :log10, :log2. Values for Integer types are rounded (with duplicate values removed, resulting in possibly less than n values).

Alternatively, if a function f is provided as scale, then iterator(r, n) iterates over the values [f(x1), f(x2), ... , f(xn)], where x1, x2, ..., xn are linearly spaced between lower and upper.

See also: iterator

source
MLJ.flat_keysMethod.
 flat_keys(params::NamedTuple)

Use dot-concatentation to express each possibly nested key of params in string form.

Example

julia> flat_keys((A=(x=2, y=3), B=9)))
["A.x", "A.y", "B"]
source
MLJ.get_typeMethod.

get_type(T, field::Symbol)

Returns the type of the field field of DataType T. Not a type-stable function.

source
MLJ.scaleMethod.
MLJ.scale(r::ParamRange)

Return the scale associated with the ParamRange object r. The possible return values are: :none (for a NominalRange), :linear, :log, :log10, :log2, or :custom (if r.scale is function).

source
MLJ.unwindMethod.
unwind(iterators...)

Represent all possible combinations of values generated by iterators as rows of a matrix A. In more detail, A has one column for each iterator in iterators and one row for each distinct possible combination of values taken on by the iterators. Elements in the first column cycle fastest, those in the last clolumn slowest.

Example

julia> iterators = ([1, 2], ["a","b"], ["x", "y", "z"]);
julia> MLJ.unwind(iterators...)
12×3 Array{Any,2}:
 1  "a"  "x"
 2  "a"  "x"
 1  "b"  "x"
 2  "b"  "x"
 1  "a"  "y"
 2  "a"  "y"
 1  "b"  "y"
 2  "b"  "y"
 1  "a"  "z"
 2  "a"  "z"
 1  "b"  "z"
 2  "b"  "z"
source
StratifiedKFold(strata,k)

Struct for StratifiedKFold provide strata's and number of partitions(k) and simply collect the object for the indices. Taken from MLBase (https://github.com/JuliaStats/MLBase.jl).

source
task = SupervisedTask(data=nothing, is_probabilistic=false, target=nothing, ignore=Symbol[], verbosity=1)

Construct a supervised learning task with input features X and target y, where: y is the column vector from data named target, if this is a single symbol, or, a vector of tuples, if target is a vector; X consists of all remaining columns of data not named in ignore, and is a table unless it has only one column, in which case it is a vector.

task = SupervisedTask(X, y; is_probabilistic=false, input_is_multivariate=true, verbosity=1)

A more customizable constructor, this returns a supervised learning task with input features X and target y, where: X must be a table or vector, according to whether it is multivariate or univariate, while y must be a vector whose elements are scalars, or tuples scalars (of constant length for ordinary multivariate predictions, and of variable length for sequence prediction). Table rows must correspond to patterns and columns to features.

X, y = task()

Returns the input X and target y of the task, also available as task.X and task.y.

source
UnivariateNominal(prob_given_level)

A discrete univariate distribution whose finite support is the set of keys of the provided dictionary, prob_given_level. The dictionary values specify the corresponding probabilities, which must be nonnegative and sum to one.

UnivariateNominal(levels, p)

A discrete univariate distribution whose finite support is the elements of the vector levels, and whose corresponding probabilities are elements of the vector p.

levels(d::UnivariateNominal)

Return the levels of d.

d = UnivariateNominal(["yes", "no", "maybe"], [0.1, 0.2, 0.7])
pdf(d, "no") # 0.2
mode(d) # "maybe"
rand(d, 5) # ["maybe", "no", "maybe", "maybe", "no"]
d = fit(UnivariateNominal, ["maybe", "no", "maybe", "yes"])
pdf(d, "maybe") ≈ 0.5 # true
levels(d) # ["yes", "no", "maybe"]

If v is a CategoricalVector then fit(UnivariateNominal, v) includes all levels in pool of v in its support, assigning unseen levels probability zero.

source
task = UnsupervisedTask(data=nothing, ignore=Symbol[], verbosity=1)

Construct an unsupervised learning task with given input data, which should be a table or, in the case of univariate inputs, a single vector.

Rows of data must correspond to patterns and columns to features. Columns in data whose names appear in ignore are ignored.

X = task()

Return the input data in form to be used in models.

source
container_type(X)

Return :table, :sparse, or :other, according to whether X is a supported table format, a supported sparse table format, or something else.

The first two formats, together abstract vectors, support the MLJBase accessor methods selectrows, selectcols, select, nrows, schema, and union_scitypes.

source
MLJBase.datanowMethod.

Get some supervised data now!!

source
MLJBase.fitresult_type(m)

Returns the fitresult type of any supervised model (or model type) m, as declared in the model mutable struct declaration.

source
MLJBase.load_amesMethod.

Load the full version of the well-known Ames Housing task.

source

Load a well-known public regression dataset with nominal features.

source
MLJBase.load_crabsMethod.

Load a well-known crab classification dataset with nominal features.

source
MLJBase.load_irisMethod.

Load a well-known public classification task with nominal features.

source

Load a reduced version of the well-known Ames Housing task, having six numerical and six categorical features.

source
MLJBase.matrixMethod.
MLJBase.matrix(X)

Convert a table source X into an Matrix; or, if X is a AbstractMatrix, return X. Optimized for column-based sources.

If instead X is a sparse table, then a SparseMatrixCSC object is returned. The integer relabelling of column names follows the lexicographic ordering (as indicated by schema(X).names).

source
MLJBase.nrowsMethod.
nrows(X)

Return the number of rows in a table, sparse table, or abstract vector.

source
MLJBase.paramsMethod.
params(m)

Recursively convert any object of subtype MLJType into a named tuple, keyed on the fields of m. The named tuple is possibly nested because params is recursively applied to the field values, which themselves might be MLJType objects.

Used, in particluar, in the case that m is a model, to inspect its nested hyperparameters:

julia> params(EnsembleModel(atom=ConstantClassifier()))
(atom = (target_type = Bool,),
 weights = Float64[],
 bagging_fraction = 0.8,
 rng_seed = 0,
 n = 100,
 parallel = true,)
source
MLJBase.partitionMethod.
partition(rows::AbstractVector{Int}, fractions...; shuffle=false)

Splits the vector rows into a tuple of vectors whose lengths are given by the corresponding fractions of length(rows). The last fraction is not provided, as it is inferred from the preceding ones. So, for example,

julia> partition(1:1000, 0.2, 0.7)
(1:200, 201:900, 901:1000)
source
MLJBase.schemaMethod.
schema(X)

Returns a struct with properties names, types with the obvious meanings. Here X is any table or sparse table.

source
MLJBase.scitypeMethod.
scitype(x)

Return the scientific type for scalar values that object x can represent. If x is a tuple, then Tuple{scitype.(x)...} is returned.

julia> scitype(4.5)
Continous

julia> scitype("book")
Unknown

julia> scitype((1, 4.5))
Tuple{Count,Continuous}

julia> using CategoricalArrays
julia> v = categorical([:m, :f, :f])
julia> scitype(v[1])
Multiclass{2}
source
scitype_union(A)

Return the type union, over all elements x generated by the iterable A, of scitype(x).

source
MLJBase.scitypesMethod.
scitypes(X)

Returns a named tuple keyed on the column names of the table X with values the corresponding scitype unions over a column's entries.

source
MLJBase.selectMethod.
select(X, r, c)

Select element of a table or sparse table at row r and column c. In the case of sparse data where the key (r, c), zero or missing is returned, depending on the value type.

See also: selectrows, selectcols

source
MLJBase.selectcolsMethod.
selectcols(X, c)

Select single or multiple columns from any table or sparse table X. If c is an abstract vector of integers or symbols, then the object returned is a table of the preferred sink type of typeof(X). If c is a single integer or column, then a Vector or CategoricalVector is returned.

source
MLJBase.selectrowsMethod.
selectrows(X, r)

Select single or multiple rows from any table, sparse table, or abstract vector X. If X is tabular, the object returned is a table of the preferred sink type of typeof(X), even a single row is selected.

source
MLJBase.tableMethod.
MLJBase.table(cols; prototype=cols)

Convert a named tuple of vectors cols, into a table. The table type returned is the "preferred sink type" for prototype (see the Tables.jl documentation).

MLJBase.table(X::AbstractMatrix; names=nothing, prototype=nothing)

Convert an abstract matrix X into a table with names (a tuple of symbols) as column names, or with labels (:x1, :x2, ..., :xn) where n=size(X, 2), if names is not specified. If prototype=nothing, then a named tuple of vectors is returned.

Equivalent to table(cols, prototype=prototype) where cols is the named tuple of columns of X, with keys(cols) = names.

source
@constant x = value

Equivalent to const x = value but registers the binding thus:

MLJBase.HANDLE_GIVEN_ID[objectid(value)] = :x

Registered objects get displayed using the variable name to which it was bound in calls to show(x), etc.

WARNING: As with any const declaration, binding x to new value of the same type is not prevented and the registration will not be updated.

source
MLJBase.@moreMacro.
@more

Entered at the REPL, equivalent to show(ans, 100). Use to get a recursive description of all fields of the last REPL value.

source
_cummulative(d::UnivariateNominal)

Return the cummulative probability vector [0, ..., 1] for the distribution d, using whatever ordering is used in the dictionary d.prob_given_level. Used only for to implement random sampling from d.

source
MLJBase._randMethod.

rand(pcummulative)

Randomly sample the distribution with discrete support 1:n which has cummulative probability vector p_cummulative=[0, ..., 1] (of length n+1). Does not check the first and last elements of p_cummulative but does not use them either.

source
_recursive_show(stream, object, current_depth, depth)

Generate a table of the field values of the MLJType object, dislaying each value by calling the method _show on it. The behaviour of _show(stream, f) is as follows:

  1. If f is itself a MLJType object, then its short form is shown

and _recursive_show generates as separate table for each of its field values (and so on, up to a depth of argument depth).

  1. Otherwise f is displayed as "(omitted T)" where T = typeof(f),

unless istoobig(f) is false (the istoobig fall-back for arbitrary types being true). In the latter case, the long (ie, MIME"plain/text") form of f is shown. To override this behaviour, overload the _show method for the type in question.

source

to display abbreviated versions of integers

source
MLJBase.handleMethod.

return abbreviated object id (as string) or it's registered handle (as string) if this exists

source

Index